Artificial Intelligence
Authors and titles for February 2024
Total of 2756 entries :
2-2001
2001-2756
- [2] arXiv:2402.00041 [ pdf , ps , html , other ]
-
Title: Spatial-temporal-demand clustering for solving large-scale vehicle routing problems with time windowsSubjects: Artificial Intelligence (cs.AI) ; Machine Learning (cs.LG); Neural and Evolutionary Computing (cs.NE); Optimization and Control (math.OC)
Abstract: Several metaheuristics use decomposition and pruning strategies to solve large-scale instances of the vehicle routing problem (VRP). Those complexity reduction techniques often rely on simple, problem-specific rules. However, the growth in available data and advances in computer hardware enable data-based approaches that use machine learning (ML) to improve scalability of solution algorithms. We propose a decompose-route-improve (DRI) framework that groups customers using clustering. Its similarity metric incorporates customers' spatial, temporal, and demand data and is formulated to reflect the problem's objective function and constraints. The resulting sub-routing problems can independently be solved using any suitable algorithm. We apply pruned local search (LS) between solved subproblems to improve the overall solution. Pruning is based on customers' similarity information obtained in the decomposition phase. In a computational study, we parameterize and compare existing clustering algorithms and benchmark the DRI against the Hybrid Genetic Search (HGS) of Vidal et al. (2013). Results show that our data-based approach outperforms classic cluster-first, route-second approaches solely based on customers' spatial information. The newly introduced similarity metric forms separate sub-VRPs and improves the selection of LS moves in the improvement phase. Thus, the DRI scales existing metaheuristics to achieve high-quality solutions faster for large-scale VRPs by efficiently reducing complexity. Further, the DRI can be easily adapted to various solution methods and VRP characteristics, such as distribution of customer locations and demands, depot location, and different time window scenarios, making it a generalizable approach to solving routing problems.
- [3] arXiv:2402.00042 [ pdf , ps , other ]
-
Title: Optimized Task Assignment and Predictive Maintenance for Industrial Machines using Markov Decision ProcessComments: 19 pages, 11 figures, 3 tablesSubjects: Artificial Intelligence (cs.AI) ; Systems and Control (eess.SY)
Abstract: This paper considers a distributed decision-making approach for manufacturing task assignment and condition-based machine health maintenance. Our approach considers information sharing between the task assignment and health management decision-making agents. We propose the design of the decision-making agents based on Markov decision processes. The key advantage of using a Markov decision process-based approach is the incorporation of uncertainty involved in the decision-making process. The paper provides detailed mathematical models along with the associated practical execution strategy. In order to demonstrate the effectiveness and practical applicability of our proposed approach, we have included a detailed numerical case study that is based on open source milling machine tool degradation data. Our case study indicates that the proposed approach offers flexibility in terms of the selection of cost parameters and it allows for offline computation and analysis of the decision-making policy. These features create and opportunity for the future work on learning of the cost parameters associated with our proposed model using artificial intelligence.
- [4] arXiv:2402.00043 [ pdf , ps , html , other ]
-
Title: Interactive and Intelligent Root Cause Analysis in Manufacturing with Causal Bayesian Networks and Knowledge GraphsSubjects: Artificial Intelligence (cs.AI) ; Computational Engineering, Finance, and Science (cs.CE); Machine Learning (cs.LG)
Abstract: Root Cause Analysis (RCA) in the manufacturing of electric vehicles is the process of identifying fault causes. Traditionally, the RCA is conducted manually, relying on process expert knowledge. Meanwhile, sensor networks collect significant amounts of data in the manufacturing process. Using this data for RCA makes it more efficient. However, purely data-driven methods like Causal Bayesian Networks have problems scaling to large-scale, real-world manufacturing processes due to the vast amount of potential cause-effect relationships (CERs). Furthermore, purely data-driven methods have the potential to leave out already known CERs or to learn spurious CERs. The paper contributes by proposing an interactive and intelligent RCA tool that combines expert knowledge of an electric vehicle manufacturing process and a data-driven machine learning method. It uses reasoning over a large-scale Knowledge Graph of the manufacturing process while learning a Causal Bayesian Network. In addition, an Interactive User Interface enables a process expert to give feedback to the root cause graph by adding and removing information to the Knowledge Graph. The interactive and intelligent RCA tool reduces the learning time of the Causal Bayesian Network while decreasing the number of spurious CERs. Thus, the interactive and intelligent RCA tool closes the feedback loop between expert and machine learning method.
- [5] arXiv:2402.00046 [ pdf , ps , other ]
-
Title: Introducing PetriRL: An Innovative Framework for JSSP Resolution Integrating Petri nets and Event-based Reinforcement LearningJournal-ref: Journal of Manufacturing Systems (2024)Subjects: Artificial Intelligence (cs.AI) ; Machine Learning (cs.LG)
Abstract: Resource utilization and production process optimization are crucial for companies in today's competitive industrial landscape. Addressing the complexities of job shop scheduling problems (JSSP) is essential to improving productivity, reducing costs, and ensuring timely delivery. We propose PetriRL, a novel framework integrating Petri nets and deep reinforcement learning (DRL) for JSSP optimization. PetriRL capitalizes on the inherent strengths of Petri nets in modelling discrete event systems while leveraging the advantages of a graph structure. The Petri net governs automated components of the process, ensuring adherence to JSSP constraints. This allows for synergistic collaboration with optimization algorithms such as DRL, particularly in critical decision-making. Unlike traditional methods, PetriRL eliminates the need to preprocess JSSP instances into disjunctive graphs and enhances the explainability of process status through its graphical structure based on places and transitions. Additionally, the inherent graph structure of Petri nets enables the dynamic additions of job operations during the inference phase without requiring agent retraining, thus enhancing flexibility. Experimental results demonstrate PetriRL's robust generalization across various instance sizes and its competitive performance on public test benchmarks and randomly generated instances. Results are compared to a wide range of optimization solutions such as heuristics, metaheuristics, and learning-based algorithms. Finally, the added values of the framework's key elements, such as event-based control and action masking, are studied in the ablation study.
- [6] arXiv:2402.00048 [ pdf , ps , html , other ]
-
Title: IICONGRAPH: improved Iconographic and Iconological Statements in Knowledge GraphsComments: 18 pagesSubjects: Artificial Intelligence (cs.AI)
Abstract: Iconography and iconology are fundamental domains when it comes to understanding artifacts of cultural heritage. Iconography deals with the study and interpretation of visual elements depicted in artifacts and their symbolism, while iconology delves deeper, exploring the underlying cultural and historical meanings. Despite the advances in representing cultural heritage with Linked Open Data (LOD), recent studies show persistent gaps in the representation of iconographic and iconological statements in current knowledge graphs (KGs). To address them, this paper presents IICONGRAPH, a KG that was created by refining and extending the iconographic and iconological statements of ArCo (the Italian KG of cultural heritage) and Wikidata. The development of IICONGRAPH was also driven by a series of requirements emerging from research case studies that were unattainable in the non-reengineered versions of the KGs. The evaluation results demonstrate that IICONGRAPH not only outperforms ArCo and Wikidata through domain-specific assessments from the literature but also serves as a robust platform for addressing the formulated research questions. IICONGRAPH is released and documented in accordance with the FAIR principles to guarantee the resource's reusability. The algorithms used to create it and assess the research questions have also been made available to ensure transparency and reproducibility. While future work focuses on ingesting more data into the KG, and on implementing it as a backbone of LLM-based question answering systems, the current version of IICONGRAPH still emerges as a valuable asset, contributing to the evolving landscape of cultural heritage representation within Knowledge Graphs, the Semantic Web, and beyond.
- [7] arXiv:2402.00052 [ pdf , ps , html , other ]
-
Title: Zero-shot Sequential Neuro-symbolic Reasoning for Automatically Generating Architecture Schematic DesignsSubjects: Artificial Intelligence (cs.AI) ; Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR)
Abstract: This paper introduces a novel automated system for generating architecture schematic designs aimed at streamlining complex decision-making at the multifamily real estate development project's outset. Leveraging the combined strengths of generative AI (neuro reasoning) and mathematical program solvers (symbolic reasoning), the method addresses both the reliance on expert insights and technical challenges in architectural schematic design. To address the large-scale and interconnected nature of design decisions needed for designing a whole building, we proposed a novel sequential neuro-symbolic reasoning approach, emulating traditional architecture design processes from initial concept to detailed layout. To remove the need to hand-craft a cost function to approximate the desired objectives, we propose a solution that uses neuro reasoning to generate constraints and cost functions that the symbolic solvers can use to solve. We also incorporate feedback loops for each design stage to ensure a tight integration between neuro and symbolic reasoning. Developed using GPT-4 without further training, our method's effectiveness is validated through comparative studies with real-world buildings. Our method can generate various building designs in accordance with the understanding of the neighborhood, showcasing its potential to transform the realm of architectural schematic design.
- [8] arXiv:2402.00053 [ pdf , ps , html , other ]
-
Title: Are We Wasting Time? A Fast, Accurate Performance Evaluation Framework for Knowledge Graph Link PredictorsSubjects: Artificial Intelligence (cs.AI) ; Machine Learning (cs.LG)
Abstract: The standard evaluation protocol for measuring the quality of Knowledge Graph Completion methods - the task of inferring new links to be added to a graph - typically involves a step which ranks every entity of a Knowledge Graph to assess their fit as a head or tail of a candidate link to be added. In Knowledge Graphs on a larger scale, this task rapidly becomes prohibitively heavy. Previous approaches mitigate this problem by using random sampling of entities to assess the quality of links predicted or suggested by a method. However, we show that this approach has serious limitations since the ranking metrics produced do not properly reflect true outcomes. In this paper, we present a thorough analysis of these effects along with the following findings. First, we empirically find and theoretically motivate why sampling uniformly at random vastly overestimates the ranking performance of a method. We show that this can be attributed to the effect of easy versus hard negative candidates. Second, we propose a framework that uses relational recommenders to guide the selection of candidates for evaluation. We provide both theoretical and empirical justification of our methodology, and find that simple and fast methods can work extremely well, and that they match advanced neural approaches. Even when a large portion of true candidates for a property are missed, the estimation barely deteriorates. With our proposed framework, we can reduce the time and computation needed similar to random sampling strategies while vastly improving the estimation; on ogbl-wikikg2, we show that accurate estimations of the full, filtered ranking can be obtained in 20 seconds instead of 30 minutes. We conclude that considerable computational effort can be saved by effective preprocessing and sampling methods and still reliably predict performance accurately of the true performance for the entire ranking procedure.
- [9] arXiv:2402.00060 [ pdf , ps , html , other ]
-
Title: Treatment of Epistemic Uncertainty in Conjunction Analysis with Dempster-Shafer TheoryComments: 28 pages, 23 figuresSubjects: Artificial Intelligence (cs.AI) ; Information Theory (cs.IT); Probability (math.PR)
Abstract: The paper presents an approach to the modelling of epistemic uncertainty in Conjunction Data Messages (CDM) and the classification of conjunction events according to the confidence in the probability of collision. The approach proposed in this paper is based on the Dempster-Shafer Theory (DSt) of evidence and starts from the assumption that the observed CDMs are drawn from a family of unknown distributions. The Dvoretzky-Kiefer-Wolfowitz (DKW) inequality is used to construct robust bounds on such a family of unknown distributions starting from a time series of CDMs. A DSt structure is then derived from the probability boxes constructed with DKW inequality. The DSt structure encapsulates the uncertainty in the CDMs at every point along the time series and allows the computation of the belief and plausibility in the realisation of a given probability of collision. The methodology proposed in this paper is tested on a number of real events and compared against existing practices in the European and French Space Agencies. We will show that the classification system proposed in this paper is more conservative than the approach taken by the European Space Agency but provides an added quantification of uncertainty in the probability of collision.
- [10] arXiv:2402.00064 [ pdf , ps , html , other ]
-
Title: Merging plans with incomplete knowledge about actions and goals through an agent-based reputation systemSubjects: Artificial Intelligence (cs.AI)
Abstract: Managing transition plans is one of the major problems of people with cognitive disabilities. Therefore, finding an automated way to generate such plans would be a helpful tool for this community. In this paper we have specifically proposed and compared different alternative ways to merge plans formed by sequences of actions of unknown similarities between goals and actions executed by several operator agents which cooperate between them applying such actions over some passive elements (node agents) that require additional executions of another plan after some time of use. Such ignorance of the similarities between plan actions and goals would justify the use of a distributed recommendation system that would provide an useful plan to be applied for a certain goal to a given operator agent, generated from the known results of previous executions of different plans by other operator agents. Here we provide the general framework of execution (agent system), and the different merging algorithms applied to this problem. The proposed agent system would act as an useful cognitive assistant for people with intelectual disabilities such as autism.
- [11] arXiv:2402.00065 [ pdf , ps , other ]
-
Title: A technical note for the 91-clauses SAT resolution with Indirect QAOA based approachSubjects: Artificial Intelligence (cs.AI) ; Quantum Physics (quant-ph)
Abstract: This paper addresses the resolution of the 3-SAT problem using a QAOA-like approach. The chosen principle involves modeling the solution ranks of the 3-SAT problem, which, in this particular case, directly represent a solution. This results in a highly compact circuit with few gates, enabling the modeling of large-sized 3-SAT problems. Numerical experimentation demonstrates that the approach can solve instances composed of 91 clauses and 20 variables with an implementation based on Qiskit.
- [12] arXiv:2402.00076 [ pdf , ps , html , other ]
-
Title: Exploitation Strategies in Conditional Markov Chain Search: A case study on the three-index assignment problemComments: 14 pagesSubjects: Artificial Intelligence (cs.AI)
Abstract: The Conditional Markov Chain Search (CMCS) is a framework for automated design of metaheuristics for discrete combinatorial optimisation problems. Given a set of algorithmic components such as hill climbers and mutations, CMCS decides in which order to apply those components. The decisions are dictated by the CMCS configuration that can be learnt offline. CMCS does not have an acceptance criterion; any moves are accepted by the framework. As a result, it is particularly good in exploration but is not as good at exploitation. In this study, we explore several extensions of the framework to improve its exploitation abilities. To perform a computational study, we applied the framework to the three-index assignment problem. The results of our experiments showed that a two-stage CMCS is indeed superior to a single-stage CMCS.
- [13] arXiv:2402.00083 [ pdf , ps , html , other ]
-
Title: Modeling Access Differences to Reduce Disparity in Resource AllocationComments: Association for Computing Machinery (2022)Subjects: Artificial Intelligence (cs.AI)
Abstract: Motivated by COVID-19 vaccine allocation, where vulnerable subpopulations are simultaneously more impacted in terms of health and more disadvantaged in terms of access to the vaccine, we formalize and study the problem of resource allocation when there are inherent access differences that correlate with advantage and disadvantage. We identify reducing resource disparity as a key goal in this context and show its role as a proxy to more nuanced downstream impacts. We develop a concrete access model that helps quantify how a given allocation translates to resource flow for the advantaged vs. the disadvantaged, based on the access gap between them. We then provide a methodology for access-aware allocation. Intuitively, the resulting allocation leverages more vaccines in locations with higher vulnerable populations to mitigate the access gap and reduce overall disparity. Surprisingly, knowledge of the access gap is often not needed to perform access-aware allocation. To support this formalism, we provide empirical evidence for our access model and show that access-aware allocation can significantly reduce resource disparity and thus improve downstream outcomes. We demonstrate this at various scales, including at county, state, national, and global levels.
- [14] arXiv:2402.00262 [ pdf , ps , other ]
-
Title: Computational Experiments Meet Large Language Model Based Agents: A Survey and PerspectiveQun Ma , Xiao Xue , Deyu Zhou , Xiangning Yu , Donghua Liu , Xuwen Zhang , Zihan Zhao , Yifan Shen , Peilin Ji , Juanjuan Li , Gang Wang , Wanpeng MaSubjects: Artificial Intelligence (cs.AI)
Abstract: Computational experiments have emerged as a valuable method for studying complex systems, involving the algorithmization of counterfactuals. However, accurately representing real social systems in Agent-based Modeling (ABM) is challenging due to the diverse and intricate characteristics of humans, including bounded rationality and heterogeneity. To address this limitation, the integration of Large Language Models (LLMs) has been proposed, enabling agents to possess anthropomorphic abilities such as complex reasoning and autonomous learning. These agents, known as LLM-based Agent, offer the potential to enhance the anthropomorphism lacking in ABM. Nonetheless, the absence of explicit explainability in LLMs significantly hinders their application in the social sciences. Conversely, computational experiments excel in providing causal analysis of individual behaviors and complex phenomena. Thus, combining computational experiments with LLM-based Agent holds substantial research potential. This paper aims to present a comprehensive exploration of this fusion. Primarily, it outlines the historical development of agent structures and their evolution into artificial societies, emphasizing their importance in computational experiments. Then it elucidates the advantages that computational experiments and LLM-based Agents offer each other, considering the perspectives of LLM-based Agent for computational experiments and vice versa. Finally, this paper addresses the challenges and future trends in this research domain, offering guidance for subsequent related studies.
- [15] arXiv:2402.00468 [ pdf , ps , other ]
-
Title: RadDQN: a Deep Q Learning-based Architecture for Finding Time-efficient Minimum Radiation Exposure PathwayComments: 12 pages, 7 main figures, code link (GitHub)Subjects: Artificial Intelligence (cs.AI)
Abstract: Recent advancements in deep reinforcement learning (DRL) techniques have sparked its multifaceted applications in the automation sector. Managing complex decision-making problems with DRL encourages its use in the nuclear industry for tasks such as optimizing radiation exposure to the personnel during normal operating conditions and potential accidental scenarios. However, the lack of efficient reward function and effective exploration strategy thwarted its implementation in the development of radiation-aware autonomous unmanned aerial vehicle (UAV) for achieving maximum radiation protection. Here, in this article, we address these intriguing issues and introduce a deep Q-learning based architecture (RadDQN) that operates on a radiation-aware reward function to provide time-efficient minimum radiation-exposure pathway in a radiation zone. We propose a set of unique exploration strategies that fine-tune the extent of exploration and exploitation based on the state-wise variation in radiation exposure during training. Further, we benchmark the predicted path with grid-based deterministic method. We demonstrate that the formulated reward function in conjugation with adequate exploration strategy is effective in handling several scenarios with drastically different radiation field distributions. When compared to vanilla DQN, our model achieves a superior convergence rate and higher training stability.
- [16] arXiv:2402.00485 [ pdf , ps , html , other ]
-
Title: A Personalized Framework for Consumer and Producer Group Fairness Optimization in Recommender SystemsComments: TORS. arXiv admin note: substantial text overlap with arXiv:2204.08085Subjects: Artificial Intelligence (cs.AI) ; Computers and Society (cs.CY); Information Retrieval (cs.IR)
Abstract: In recent years, there has been an increasing recognition that when machine learning (ML) algorithms are used to automate decisions, they may mistreat individuals or groups, with legal, ethical, or economic implications. Recommender systems are prominent examples of these machine learning (ML) systems that aid users in making decisions. The majority of past literature research on RS fairness treats user and item fairness concerns independently, ignoring the fact that recommender systems function in a two-sided marketplace. In this paper, we propose CP-FairRank, an optimization-based re-ranking algorithm that seamlessly integrates fairness constraints from both the consumer and producer side in a joint objective framework. The framework is generalizable and may take into account varied fairness settings based on group segmentation, recommendation model selection, and domain, which is one of its key characteristics. For instance, we demonstrate that the system may jointly increase consumer and producer fairness when (un)protected consumer groups are defined on the basis of their activity level and main-streamness, while producer groups are defined according to their popularity level. For empirical validation, through large-scale on eight datasets and four mainstream collaborative filtering (CF) recommendation models, we demonstrate that our proposed strategy is able to improve both consumer and producer fairness without compromising or very little overall recommendation quality, demonstrating the role algorithms may play in avoiding data biases.
- [17] arXiv:2402.00491 [ pdf , ps , html , other ]
-
Title: EXMOS: Explanatory Model Steering Through Multifaceted Explanations and Data ConfigurationsComments: This is a pre-print version only for early release. Please view the conference published version from ACM CHI 2024 to get the latest version of the paperJournal-ref: Proceedings of the CHI Conference on Human Factors in Computing Systems (CHI '24), May 11--16, 2024, Honolulu, HI, USASubjects: Artificial Intelligence (cs.AI) ; Human-Computer Interaction (cs.HC)
Abstract: Explanations in interactive machine-learning systems facilitate debugging and improving prediction models. However, the effectiveness of various global model-centric and data-centric explanations in aiding domain experts to detect and resolve potential data issues for model improvement remains unexplored. This research investigates the influence of data-centric and model-centric global explanations in systems that support healthcare experts in optimising models through automated and manual data configurations. We conducted quantitative (n=70) and qualitative (n=30) studies with healthcare experts to explore the impact of different explanations on trust, understandability and model improvement. Our results reveal the insufficiency of global model-centric explanations for guiding users during data configuration. Although data-centric explanations enhanced understanding of post-configuration system changes, a hybrid fusion of both explanation types demonstrated the highest effectiveness. Based on our study results, we also present design implications for effective explanation-driven interactive machine-learning systems.
- [18] arXiv:2402.00591 [ pdf , ps , html , other ]
-
Title: Sandra -- A Neuro-Symbolic Reasoner Based On Descriptions And SituationsSubjects: Artificial Intelligence (cs.AI)
Abstract: This paper presents sandra, a neuro-symbolic reasoner combining vectorial representations with deductive reasoning. Sandra builds a vector space constrained by an ontology and performs reasoning over it. The geometric nature of the reasoner allows its combination with neural networks, bridging the gap with symbolic knowledge representations. Sandra is based on the Description and Situation (DnS) ontology design pattern, a formalization of frame semantics. Given a set of facts (a situation) it allows to infer all possible perspectives (descriptions) that can provide a plausible interpretation for it, even in presence of incomplete information. We prove that our method is correct with respect to the DnS model. We experiment with two different tasks and their standard benchmarks, demonstrating that, without increasing complexity, sandra (i) outperforms all the baselines (ii) provides interpretability in the classification process, and (iii) allows control over the vector space, which is designed a priori.
- [19] arXiv:2402.00658 [ pdf , ps , html , other ]
-
Title: Learning Planning-based Reasoning by Trajectories Collection and Process Reward SynthesizingComments: 17 pages, 9 figuresSubjects: Artificial Intelligence (cs.AI) ; Computation and Language (cs.CL)
Abstract: Large Language Models (LLMs) have demonstrated significant potential in handling complex reasoning tasks through step-by-step rationale generation. However, recent studies have raised concerns regarding the hallucination and flaws in their reasoning process. Substantial efforts are being made to improve the reliability and faithfulness of the generated rationales. Some approaches model reasoning as planning, while others focus on annotating for process supervision. Nevertheless, the planning-based search process often results in high latency due to the frequent assessment of intermediate reasoning states and the extensive exploration space. Additionally, supervising the reasoning process with human annotation is costly and challenging to scale for LLM training. To address these issues, in this paper, we propose a framework to learn planning-based reasoning through Direct Preference Optimization (DPO) on collected trajectories, which are ranked according to synthesized process rewards. Our results on challenging logical reasoning benchmarks demonstrate the effectiveness of our learning framework, showing that our 7B model can surpass the strong counterparts like GPT-3.5-Turbo.
- [20] arXiv:2402.00715 [ pdf , ps , html , other ]
-
Title: Intent Assurance using LLMs guided by Intent DriftSubjects: Artificial Intelligence (cs.AI) ; Networking and Internet Architecture (cs.NI); Methodology (stat.ME)
Abstract: Intent-Based Networking (IBN) presents a paradigm shift for network management, by promising to align intents and business objectives with network operations--in an automated manner. However, its practical realization is challenging: 1) processing intents, i.e., translate, decompose and identify the logic to fulfill the intent, and 2) intent conformance, that is, considering dynamic networks, the logic should be adequately adapted to assure intents. To address the latter, intent assurance is tasked with continuous verification and validation, including taking the necessary actions to align the operational and target states. In this paper, we define an assurance framework that allows us to detect and act when intent drift occurs. To do so, we leverage AI-driven policies, generated by Large Language Models (LLMs) which can quickly learn the necessary in-context requirements, and assist with the fulfillment and assurance of intents.
- [21] arXiv:2402.00738 [ pdf , ps , other ]
-
Title: FM3Q: Factorized Multi-Agent MiniMax Q-Learning for Two-Team Zero-Sum Markov GameSubjects: Artificial Intelligence (cs.AI)
Abstract: Many real-world applications involve some agents that fall into two teams, with payoffs that are equal within the same team but of opposite sign across the opponent team. The so-called two-team zero-sum Markov games (2t0sMGs) can be resolved with reinforcement learning in recent years. However, existing methods are thus inefficient in light of insufficient consideration of intra-team credit assignment, data utilization and computational intractability. In this paper, we propose the individual-global-minimax (IGMM) principle to ensure the coherence between two-team minimax behaviors and the individual greedy behaviors through Q functions in 2t0sMGs. Based on it, we present a novel multi-agent reinforcement learning framework, Factorized Multi-Agent MiniMax Q-Learning (FM3Q), which can factorize the joint minimax Q function into individual ones and iteratively solve for the IGMM-satisfied minimax Q functions for 2t0sMGs. Moreover, an online learning algorithm with neural networks is proposed to implement FM3Q and obtain the deterministic and decentralized minimax policies for two-team players. A theoretical analysis is provided to prove the convergence of FM3Q. Empirically, we use three environments to evaluate the learning efficiency and final performance of FM3Q and show its superiority on 2t0sMGs.
- [22] arXiv:2402.00901 [ pdf , ps , other ]
-
Title: Real Sparks of Artificial Intelligence and the Importance of Inner InterpretabilitySubjects: Artificial Intelligence (cs.AI)
Abstract: The present paper looks at one of the most thorough articles on the intelligence of GPT, research conducted by engineers at Microsoft. Although there is a great deal of value in their work, I will argue that, for familiar philosophical reasons, their methodology, !Blackbox Interpretability"#is wrongheaded. But there is a better way. There is an exciting and emerging discipline of !Inner Interpretability"#(and specifically Mechanistic Interpretability) that aims to uncover the internal activations and weights of models in order to understand what they represent and the algorithms they implement. In my view, a crucial mistake in Black-box Interpretability is the failure to appreciate that how processes are carried out matters when it comes to intelligence and understanding. I can#t pretend to have a full story that provides both necessary and sufficient conditions for being intelligent, but I do think that Inner Interpretability dovetails nicely with plausible philosophical views of what intelligence requires. So the conclusion is modest, but the important point in my view is seeing how to get the research on the right track. Towards the end of the paper, I will show how some of the philosophical concepts can be used to further refine how Inner Interpretability is approached, so the paper helps draw out a profitable, future two-way exchange between Philosophers and Computer Scientists.
- [23] arXiv:2402.01118 [ pdf , ps , html , other ]
-
Title: PokeLLMon: A Human-Parity Agent for Pokemon Battles with Large Language ModelsComments: 10 pagesSubjects: Artificial Intelligence (cs.AI) ; Computation and Language (cs.CL)
Abstract: We introduce PokeLLMon, the first LLM-embodied agent that achieves human-parity performance in tactical battle games, as demonstrated in Pokemon battles. The design of PokeLLMon incorporates three key strategies: (i) In-context reinforcement learning that instantly consumes text-based feedback derived from battles to iteratively refine the policy; (ii) Knowledge-augmented generation that retrieves external knowledge to counteract hallucination and enables the agent to act timely and properly; (iii) Consistent action generation to mitigate the panic switching phenomenon when the agent faces a powerful opponent and wants to elude the battle. We show that online battles against human demonstrates PokeLLMon's human-like battle strategies and just-in-time decision making, achieving 49% of win rate in the Ladder competitions and 56% of win rate in the invited battles. Our implementation and playable battle logs are available at: this https URL .
- [24] arXiv:2402.01276 [ pdf , ps , html , other ]
-
Title: Federated Unlearning: a Perspective of Stability and FairnessSubjects: Artificial Intelligence (cs.AI)
Abstract: This paper explores the multifaceted consequences of federated unlearning (FU) with data heterogeneity. We introduce key metrics for FU assessment, concentrating on verification, global stability, and local fairness, and investigate the inherent trade-offs. Furthermore, we formulate the unlearning process with data heterogeneity through an optimization framework. Our key contribution lies in a comprehensive theoretical analysis of the trade-offs in FU and provides insights into data heterogeneity's impacts on FU. Leveraging these insights, we propose FU mechanisms to manage the trade-offs, guiding further development for FU mechanisms. We empirically validate that our FU mechanisms effectively balance trade-offs, confirming insights derived from our theoretical analysis.
- [25] arXiv:2402.01292 [ pdf , ps , html , other ]
-
Title: Towards the New XAI: A Hypothesis-Driven Approach to Decision Support Using EvidenceComments: 26 pagesSubjects: Artificial Intelligence (cs.AI) ; Human-Computer Interaction (cs.HC)
Abstract: Prior research on AI-assisted human decision-making has explored several different explainable AI (XAI) approaches. A recent paper has proposed a paradigm shift calling for hypothesis-driven XAI through a conceptual framework called evaluative AI that gives people evidence that supports or refutes hypotheses without necessarily giving a decision-aid recommendation. In this paper, we describe and evaluate an approach for hypothesis-driven XAI based on the Weight of Evidence (WoE) framework, which generates both positive and negative evidence for a given hypothesis. Through human behavioural experiments, we show that our hypothesis-driven approach increases decision accuracy and reduces reliance compared to a recommendation-driven approach and an AI-explanation-only baseline, but with a small increase in under-reliance compared to the recommendation-driven approach. Further, we show that participants used our hypothesis-driven approach in a materially different way to the two baselines.
- [26] arXiv:2402.01499 [ pdf , ps , other ]
-
Title: Developing and Evaluating a Design Method for Positive Artificial IntelligenceSubjects: Artificial Intelligence (cs.AI)
Abstract: As artificial intelligence (AI) continues advancing, ensuring positive societal impacts becomes critical, especially as AI systems become increasingly ubiquitous in various aspects of life. However, developing "AI for good" poses substantial challenges around aligning systems with complex human values. Presently, we lack mature methods for addressing these challenges. This article presents and evaluates the Positive AI design method aimed at addressing this gap. The method provides a human-centered process to translate wellbeing aspirations into concrete practices. First, we explain the method's four key steps: contextualizing, operationalizing, optimizing, and implementing wellbeing supported by continuous measurement for feedback cycles. We then present a multiple case study where novice designers applied the method, revealing strengths and weaknesses related to efficacy and usability. Next, an expert evaluation study assessed the quality of the resulting concepts, rating them moderately high for feasibility, desirability, and plausibility of achieving intended wellbeing benefits. Together, these studies provide preliminary validation of the method's ability to improve AI design, while surfacing areas needing refinement like developing support for complex steps. Proposed adaptations such as examples and evaluation heuristics could address weaknesses. Further research should examine sustained application over multiple projects. This human-centered approach shows promise for realizing the vision of 'AI for Wellbeing' that does not just avoid harm, but actively benefits humanity.
- [27] arXiv:2402.01602 [ pdf , ps , html , other ]
-
Title: Foundation Model Sherpas: Guiding Foundation Models through Knowledge and ReasoningComments: 9 pagesSubjects: Artificial Intelligence (cs.AI)
Abstract: Foundation models (FMs) such as large language models have revolutionized the field of AI by showing remarkable performance in various tasks. However, they exhibit numerous limitations that prevent their broader adoption in many real-world systems, which often require a higher bar for trustworthiness and usability. Since FMs are trained using loss functions aimed at reconstructing the training corpus in a self-supervised manner, there is no guarantee that the model's output aligns with users' preferences for a specific task at hand. In this survey paper, we propose a conceptual framework that encapsulates different modes by which agents could interact with FMs and guide them suitably for a set of tasks, particularly through knowledge augmentation and reasoning. Our framework elucidates agent role categories such as updating the underlying FM, assisting with prompting the FM, and evaluating the FM output. We also categorize several state-of-the-art approaches into agent interaction protocols, highlighting the nature and extent of involvement of the various agent roles. The proposed framework provides guidance for future directions to further realize the power of FMs in practical AI systems.
- [28] arXiv:2402.01607 [ pdf , ps , html , other ]
-
Title: Natural Counterfactuals With Necessary BacktrackingSubjects: Artificial Intelligence (cs.AI) ; Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Neural and Evolutionary Computing (cs.NE); Methodology (stat.ME)
Abstract: Counterfactual reasoning is pivotal in human cognition and especially important for providing explanations and making decisions. While Judea Pearl's influential approach is theoretically elegant, its generation of a counterfactual scenario often requires interventions that are too detached from the real scenarios to be feasible. In response, we propose a framework of natural counterfactuals and a method for generating counterfactuals that are natural with respect to the actual world's data distribution. Our methodology refines counterfactual reasoning, allowing changes in causally preceding variables to minimize deviations from realistic scenarios. To generate natural counterfactuals, we introduce an innovative optimization framework that permits but controls the extent of backtracking with a naturalness criterion. Empirical experiments indicate the effectiveness of our method.
- [29] arXiv:2402.01677 [ pdf , ps , other ]
-
Title: Embedding Ontologies via Incorporating Extensional and Intensional KnowledgeComments: Submitting to IJCAI2024; 9 pages and 3 figuresSubjects: Artificial Intelligence (cs.AI) ; Computation and Language (cs.CL)
Abstract: Ontologies contain rich knowledge within domain, which can be divided into two categories, namely extensional knowledge and intensional knowledge. Extensional knowledge provides information about the concrete instances that belong to specific concepts in the ontology, while intensional knowledge details inherent properties, characteristics, and semantic associations among concepts. However, existing ontology embedding approaches fail to take both extensional knowledge and intensional knowledge into fine consideration simultaneously. In this paper, we propose a novel ontology embedding approach named EIKE (Extensional and Intensional Knowledge Embedding) by representing ontologies in two spaces, called extensional space and intensional space. EIKE presents a unified framework for embedding instances, concepts and their relations in an ontology, applying a geometry-based method to model extensional knowledge and a pretrained language model to model intensional knowledge, which can capture both structure information and textual information. Experimental results show that EIKE significantly outperforms state-of-the-art methods in three datasets for both triple classification and link prediction, indicating that EIKE provides a more comprehensive and representative perspective of the domain.
- [30] arXiv:2402.01786 [ pdf , ps , html , other ]
-
Title: COA-GPT: Generative Pre-trained Transformers for Accelerated Course of Action Development in Military OperationsComments: Accepted at the NATO Science and Technology Organization Symposium (ICMCIS) organized by the Information Systems Technology (IST) Panel, IST-205-RSY - the ICMCIS, held in Koblenz, Germany, 23-24 April 2024Subjects: Artificial Intelligence (cs.AI) ; Computation and Language (cs.CL); Human-Computer Interaction (cs.HC); Machine Learning (cs.LG)
Abstract: The development of Courses of Action (COAs) in military operations is traditionally a time-consuming and intricate process. Addressing this challenge, this study introduces COA-GPT, a novel algorithm employing Large Language Models (LLMs) for rapid and efficient generation of valid COAs. COA-GPT incorporates military doctrine and domain expertise to LLMs through in-context learning, allowing commanders to input mission information - in both text and image formats - and receive strategically aligned COAs for review and approval. Uniquely, COA-GPT not only accelerates COA development, producing initial COAs within seconds, but also facilitates real-time refinement based on commander feedback. This work evaluates COA-GPT in a military-relevant scenario within a militarized version of the StarCraft II game, comparing its performance against state-of-the-art reinforcement learning algorithms. Our results demonstrate COA-GPT's superiority in generating strategically sound COAs more swiftly, with added benefits of enhanced adaptability and alignment with commander intentions. COA-GPT's capability to rapidly adapt and update COAs during missions presents a transformative potential for military planning, particularly in addressing planning discrepancies and capitalizing on emergent windows of opportunities.
- [31] arXiv:2402.01817 [ pdf , ps , html , other ]
-
Title: LLMs Can't Plan, But Can Help Planning in LLM-Modulo FrameworksSubbarao Kambhampati , Karthik Valmeekam , Lin Guan , Kaya Stechly , Mudit Verma , Siddhant Bhambri , Lucas Saldyt , Anil MurthySubjects: Artificial Intelligence (cs.AI) ; Machine Learning (cs.LG)
Abstract: There is considerable confusion about the role of Large Language Models (LLMs) in planning and reasoning tasks. On one side are over-optimistic claims that LLMs can indeed do these tasks with just the right prompting or self-verification strategies. On the other side are perhaps over-pessimistic claims that all that LLMs are good for in planning/reasoning tasks are as mere translators of the problem specification from one syntactic format to another, and ship the problem off to external symbolic solvers. In this position paper, we take the view that both these extremes are misguided. We argue that auto-regressive LLMs cannot, by themselves, do planning or self-verification (which is after all a form of reasoning), and shed some light on the reasons for misunderstandings in the literature. We will also argue that LLMs should be viewed as universal approximate knowledge sources that have much more meaningful roles to play in planning/reasoning tasks beyond simple front-end/back-end format translators. We present a vision of {\bf LLM-Modulo Frameworks} that combine the strengths of LLMs with external model-based verifiers in a tighter bi-directional interaction regime. We will show how the models driving the external verifiers themselves can be acquired with the help of LLMs. We will also argue that rather than simply pipelining LLMs and symbolic components, this LLM-Modulo Framework provides a better neuro-symbolic approach that offers tighter integration between LLMs and symbolic components, and allows extending the scope of model-based planning/reasoning regimes towards more flexible knowledge, problem and preference specifications.
- [32] arXiv:2402.01889 [ pdf , ps , html , other ]
-
Title: The Role of Foundation Models in Neuro-Symbolic Learning and ReasoningComments: Pre-printSubjects: Artificial Intelligence (cs.AI) ; Machine Learning (cs.LG)
Abstract: Neuro-Symbolic AI (NeSy) holds promise to ensure the safe deployment of AI systems, as interpretable symbolic techniques provide formal behaviour guarantees. The challenge is how to effectively integrate neural and symbolic computation, to enable learning and reasoning from raw data. Existing pipelines that train the neural and symbolic components sequentially require extensive labelling, whereas end-to-end approaches are limited in terms of scalability, due to the combinatorial explosion in the symbol grounding problem. In this paper, we leverage the implicit knowledge within foundation models to enhance the performance in NeSy tasks, whilst reducing the amount of data labelling and manual engineering. We introduce a new architecture, called NeSyGPT, which fine-tunes a vision-language foundation model to extract symbolic features from raw data, before learning a highly expressive answer set program to solve a downstream task. Our comprehensive evaluation demonstrates that NeSyGPT has superior accuracy over various baselines, and can scale to complex NeSy tasks. Finally, we highlight the effective use of a large language model to generate the programmatic interface between the neural and symbolic components, significantly reducing the amount of manual engineering required.
- [33] arXiv:2402.02033 [ pdf , ps , html , other ]
-
Title: Benchmark for CEC 2024 Competition on Multiparty Multiobjective OptimizationComments: 15 pages, 0 figuresSubjects: Artificial Intelligence (cs.AI) ; Neural and Evolutionary Computing (cs.NE)
Abstract: The competition focuses on Multiparty Multiobjective Optimization Problems (MPMOPs), where multiple decision makers have conflicting objectives, as seen in applications like UAV path planning. Despite their importance, MPMOPs remain understudied in comparison to conventional multiobjective optimization. The competition aims to address this gap by encouraging researchers to explore tailored modeling approaches. The test suite comprises two parts: problems with common Pareto optimal solutions and Biparty Multiobjective UAV Path Planning (BPMO-UAVPP) problems with unknown solutions. Optimization algorithms for the first part are evaluated using Multiparty Inverted Generational Distance (MPIGD), and the second part is evaluated using Multiparty Hypervolume (MPHV) metrics. The average algorithm ranking across all problems serves as a performance benchmark.
- [34] arXiv:2402.02053 [ pdf , ps , html , other ]
-
Title: Affordable Generative AgentsSubjects: Artificial Intelligence (cs.AI) ; Human-Computer Interaction (cs.HC)
Abstract: The emergence of large language models (LLMs) has significantly advanced the simulation of believable interactive agents. However, the substantial cost on maintaining the prolonged agent interactions poses challenge over the deployment of believable LLM-based agents. Therefore, in this paper, we develop Affordable Generative Agents (AGA), a framework for enabling the generation of believable and low-cost interactions on both agent-environment and inter-agents levels. Specifically, for agent-environment interactions, we substitute repetitive LLM inferences with learned policies; while for inter-agent interactions, we model the social relationships between agents and compress auxiliary dialogue information. Extensive experiments on multiple environments show the effectiveness and efficiency of our proposed framework. Also, we delve into the mechanisms of emergent believable behaviors lying in LLM agents, demonstrating that agents can only generate finite behaviors in fixed environments, based upon which, we understand ways to facilitate emergent interaction behaviors. Our code is publicly available at: \url{ this https URL }.
- [35] arXiv:2402.02146 [ pdf , ps , other ]
-
Title: Emergency Computing: An Adaptive Collaborative Inference Method Based on Hierarchical Reinforcement LearningSubjects: Artificial Intelligence (cs.AI) ; Machine Learning (cs.LG); Networking and Internet Architecture (cs.NI); Signal Processing (eess.SP)
Abstract: In achieving effective emergency response, the timely acquisition of environmental information, seamless command data transmission, and prompt decision-making are crucial. This necessitates the establishment of a resilient emergency communication dedicated network, capable of providing communication and sensing services even in the absence of basic infrastructure. In this paper, we propose an Emergency Network with Sensing, Communication, Computation, Caching, and Intelligence (E-SC3I). The framework incorporates mechanisms for emergency computing, caching, integrated communication and sensing, and intelligence empowerment. E-SC3I ensures rapid access to a large user base, reliable data transmission over unstable links, and dynamic network deployment in a changing environment. However, these advantages come at the cost of significant computation overhead. Therefore, we specifically concentrate on emergency computing and propose an adaptive collaborative inference method (ACIM) based on hierarchical reinforcement learning. Experimental results demonstrate our method's ability to achieve rapid inference of AI models with constrained computational and communication resources.
- [36] arXiv:2402.02164 [ pdf , ps , other ]
-
Title: TSIS: A Supplementary Algorithm to t-SMILES for Fragment-based Molecular RepresentationComments: 7 pages, 3 figures, 2 tablesSubjects: Artificial Intelligence (cs.AI) ; Biomolecules (q-bio.BM)
Abstract: String-based molecular representations, such as SMILES, are a de facto standard for linearly representing molecular information. However, the must be paired symbols and the parsing algorithm result in long grammatical dependencies, making it difficult for even state-of-the-art deep learning models to accurately comprehend the syntax and semantics. Although DeepSMILES and SELFIES have addressed certain limitations, they still struggle with advanced grammar, which makes some strings difficult to read. This study introduces a supplementary algorithm, TSIS (TSID Simplified), to t-SMILES family. Comparative experiments between TSIS and another fragment-based linear solution, SAFE, indicate that SAFE presents challenges in managing long-term dependencies in grammar. TSIS continues to use the tree defined in t-SMILES as its foundational data structure, which sets it apart from the SAFE model. The performance of TSIS models surpasses that of SAFE models, indicating that the tree structure of the t-SMILES family provides certain advantages.
- [37] arXiv:2402.02330 [ pdf , ps , html , other ]
-
Title: Enhance Reasoning for Large Language Models in the Game WerewolfSubjects: Artificial Intelligence (cs.AI) ; Computation and Language (cs.CL)
Abstract: This paper presents an innovative framework that integrates Large Language Models (LLMs) with an external Thinker module to enhance the reasoning capabilities of LLM-based agents. Unlike augmenting LLMs with prompt engineering, Thinker directly harnesses knowledge from databases and employs various optimization techniques. The framework forms a reasoning hierarchy where LLMs handle intuitive System-1 tasks such as natural language processing, while the Thinker focuses on cognitive System-2 tasks that require complex logical analysis and domain-specific knowledge. Our framework is presented using a 9-player Werewolf game that demands dual-system reasoning. We introduce a communication protocol between LLMs and the Thinker, and train the Thinker using data from 18800 human sessions and reinforcement learning. Experiments demonstrate the framework's effectiveness in deductive reasoning, speech generation, and online game evaluation. Additionally, we fine-tune a 6B LLM to surpass GPT4 when integrated with the Thinker. This paper also contributes the largest dataset for social deduction games to date.
- [38] arXiv:2402.02392 [ pdf , ps , other ]
-
Title: DeLLMa: A Framework for Decision Making Under Uncertainty with Large Language ModelsComments: 23 pages, 17 figuresSubjects: Artificial Intelligence (cs.AI) ; Computation and Language (cs.CL); Machine Learning (cs.LG)
Abstract: Large language models (LLMs) are increasingly used across society, including in domains like business, engineering, and medicine. These fields often grapple with decision-making under uncertainty, a critical yet challenging task. In this paper, we show that directly prompting LLMs on these types of decision-making problems yields poor results, especially as the problem complexity increases. To overcome this limitation, we propose DeLLMa (Decision-making Large Language Model assistant), a framework designed to enhance decision-making accuracy in uncertain environments. DeLLMa involves a multi-step scaffolding procedure, drawing upon principles from decision theory and utility theory, to provide an optimal and human-auditable decision-making process. We validate our framework on decision-making environments involving real agriculture and finance data. Our results show that DeLLMa can significantly improve LLM decision-making performance, achieving up to a 40% increase in accuracy over competing methods.
- [39] arXiv:2402.02468 [ pdf , ps , other ]
-
Title: Fast Peer Adaptation with Context-aware ExplorationSubjects: Artificial Intelligence (cs.AI) ; Machine Learning (cs.LG); Multiagent Systems (cs.MA)
Abstract: Fast adapting to unknown peers (partners or opponents) with different strategies is a key challenge in multi-agent games. To do so, it is crucial for the agent to efficiently probe and identify the peer's strategy, as this is the prerequisite for carrying out the best response in adaptation. However, it is difficult to explore the strategies of unknown peers, especially when the games are partially observable and have a long horizon. In this paper, we propose a peer identification reward, which rewards the learning agent based on how well it can identify the behavior pattern of the peer over the historical context, such as the observation over multiple episodes. This reward motivates the agent to learn a context-aware policy for effective exploration and fast adaptation, i.e., to actively seek and collect informative feedback from peers when uncertain about their policies and to exploit the context to perform the best response when confident. We evaluate our method on diverse testbeds that involve competitive (Kuhn Poker), cooperative (PO-Overcooked), or mixed (Predator-Prey-W) games with peer agents. We demonstrate that our method induces more active exploration behavior, achieving faster adaptation and better outcomes than existing methods.
- [40] arXiv:2402.02547 [ pdf , ps , other ]
-
Title: Integration of cognitive tasks into artificial general intelligence test for large modelsYouzhi Qu , Chen Wei , Penghui Du , Wenxin Che , Chi Zhang , Wanli Ouyang , Yatao Bian , Feiyang Xu , Bin Hu , Kai Du , Haiyan Wu , Jia Liu , Quanying LiuSubjects: Artificial Intelligence (cs.AI) ; Computation and Language (cs.CL)
Abstract: During the evolution of large models, performance evaluation is necessarily performed to assess their capabilities and ensure safety before practical application. However, current model evaluations mainly rely on specific tasks and datasets, lacking a united framework for assessing the multidimensional intelligence of large models. In this perspective, we advocate for a comprehensive framework of cognitive science-inspired artificial general intelligence (AGI) tests, aimed at fulfilling the testing needs of large models with enhanced capabilities. The cognitive science-inspired AGI tests encompass the full spectrum of intelligence facets, including crystallized intelligence, fluid intelligence, social intelligence, and embodied intelligence. To assess the multidimensional intelligence of large models, the AGI tests consist of a battery of well-designed cognitive tests adopted from human intelligence tests, and then naturally encapsulates into an immersive virtual community. We propose increasing the complexity of AGI testing tasks commensurate with advancements in large models and emphasizing the necessity for the interpretation of test results to avoid false negatives and false positives. We believe that cognitive science-inspired AGI tests will effectively guide the targeted improvement of large models in specific dimensions of intelligence and accelerate the integration of large models into human society.
- [41] arXiv:2402.02611 [ pdf , ps , html , other ]
-
Title: PuzzleBench: Can LLMs Solve Challenging First-Order Combinatorial Reasoning Problems?Subjects: Artificial Intelligence (cs.AI) ; Computation and Language (cs.CL); Machine Learning (cs.LG)
Abstract: Recent works show that the largest of the large language models (LLMs) can solve many simple reasoning tasks expressed in natural language, without any/much supervision. But, can they also solve challenging first-order combinatorial reasoning problems, such as graph coloring, knapsack and cryptarithmetic? To answer this question, we present PuzzleBench, a dataset of 31 such challenging problems along with a few solved instances for each problem. These problems are all first order, i.e., they can be instantiated with problem instances of varying sizes, and most of them are NP-hard, requiring several reasoning steps to reach the solution. We first observe that LLMs, even when aided by symbolic solvers, perform rather poorly on our dataset. In response, we propose a new approach, Puzzle-LM, which combines LLMs with both symbolic solvers and program interpreters, along with feedback from solved examples, to achieve huge performance gains. Our extensive experimentation and analyses offer new insights into the reasoning abilities and limitations of present-day LLMs.
- [42] arXiv:2402.02658 [ pdf , ps , html , other ]
-
Title: Multi-step Problem Solving Through a Verifier: An Empirical Analysis on Model-induced Process SupervisionSubjects: Artificial Intelligence (cs.AI) ; Computation and Language (cs.CL); Machine Learning (cs.LG)
Abstract: Process supervision, using a trained verifier to evaluate the intermediate steps generated by reasoner, has demonstrated significant improvements in multi-step problem solving. In this paper, to avoid expensive human annotation effort on the verifier training data, we introduce Model-induced Process Supervision (MiPS), a novel method for automating data curation. MiPS annotates an intermediate step by sampling completions of this solution through the reasoning model, and obtaining an accuracy defined as the proportion of correct completions. Errors in the reasoner would cause MiPS to underestimate the accuracy of intermediate steps, therefore, we suggest and empirically show that verification focusing on high predicted scores of the verifier shall be preferred over that of low predicted scores, contrary to prior work. Our approach significantly improves the performance of PaLM 2 on math and coding tasks (accuracy +0.67% on GSM8K, +4.16% on MATH, +0.92% on MBPP compared with an output supervision trained verifier). Additionally, our study demonstrates that the verifier exhibits strong generalization ability across different reasoning models.
- [43] arXiv:2402.02716 [ pdf , ps , other ]
-
Title: Understanding the planning of LLM agents: A surveyXu Huang , Weiwen Liu , Xiaolong Chen , Xingmei Wang , Hao Wang , Defu Lian , Yasheng Wang , Ruiming Tang , Enhong ChenComments: 9 pages, 2 tables, 2 figuresSubjects: Artificial Intelligence (cs.AI) ; Computation and Language (cs.CL); Machine Learning (cs.LG)
Abstract: As Large Language Models (LLMs) have shown significant intelligence, the progress to leverage LLMs as planning modules of autonomous agents has attracted more attention. This survey provides the first systematic view of LLM-based agents planning, covering recent works aiming to improve planning ability. We provide a taxonomy of existing works on LLM-Agent planning, which can be categorized into Task Decomposition, Plan Selection, External Module, Reflection and Memory. Comprehensive analyses are conducted for each direction, and further challenges for the field of research are discussed.
- [44] arXiv:2402.02805 [ pdf , ps , other ]
-
Title: Graph-enhanced Large Language Models in Asynchronous Plan ReasoningFangru Lin , Emanuele La Malfa , Valentin Hofmann , Elle Michelle Yang , Anthony Cohn , Janet B. PierrehumbertSubjects: Artificial Intelligence (cs.AI) ; Computation and Language (cs.CL); Machine Learning (cs.LG)
Abstract: Reasoning about asynchronous plans is challenging since it requires sequential and parallel planning to optimize time costs. Can large language models (LLMs) succeed at this task? Here, we present the first large-scale study investigating this question. We find that a representative set of closed and open-source LLMs, including GPT-4 and LLaMA-2, behave poorly when not supplied with illustrations about the task-solving process in our benchmark AsyncHow. We propose a novel technique called Plan Like a Graph (PLaG) that combines graphs with natural language prompts and achieves state-of-the-art results. We show that although PLaG can boost model performance, LLMs still suffer from drastic degradation when task complexity increases, highlighting the limits of utilizing LLMs for simulating digital devices. We see our study as an exciting step towards using LLMs as efficient autonomous agents.
- [45] arXiv:2402.02885 [ pdf , ps , other ]
-
Title: A Review on Building Blocks of Decentralized Artificial IntelligenceComments: This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessibleSubjects: Artificial Intelligence (cs.AI) ; Cryptography and Security (cs.CR); Distributed, Parallel, and Cluster Computing (cs.DC)
Abstract: Artificial intelligence is transforming our lives, and technological progress and transfer from the academic and theoretical sphere to the real world are accelerating yearly. But during that progress and transition, several open problems and questions need to be addressed for the field to develop ethically, such as digital privacy, ownership, and control. These are some of the reasons why the currently most popular approaches of artificial intelligence, i.e., centralized AI (CEAI), are questionable, with other directions also being widely explored, such as decentralized artificial intelligence (DEAI), to solve some of the most reaching problems. This paper provides a systematic literature review (SLR) of existing work in the field of DEAI, presenting the findings of 71 identified studies. The paper's primary focus is identifying the building blocks of DEAI solutions and networks, tackling the DEAI analysis from a bottom-up approach. In the end, future directions of research and open problems are proposed.
- [46] arXiv:2402.02978 [ pdf , ps , other ]
-
Title: Evaluating Datalog Tools for Meta-reasoning over OWL 2 QLComments: Under consideration in Theory and Practice of Logic Programming (TPLP)Subjects: Artificial Intelligence (cs.AI) ; Logic in Computer Science (cs.LO)
Abstract: Metamodeling is a general approach to expressing knowledge about classes and properties in an ontology. It is a desirable modeling feature in multiple applications that simplifies the extension and reuse of ontologies. Nevertheless, allowing metamodeling without restrictions is problematic for several reasons, mainly due to undecidability issues. Practical languages, therefore, forbid classes to occur as instances of other classes or treat such occurrences as semantically different objects. Specifically, meta-querying in SPARQL under the Direct Semantic Entailment Regime (DSER) uses the latter approach, thereby effectively not supporting meta-queries. However, several extensions enabling different metamodeling features have been proposed over the last decade. This paper deals with the Metamodeling Semantics (MS) over OWL 2 QL and the Metamodeling Semantic Entailment Regime (MSER), as proposed in Lenzerini et al. (2015) and Lenzerini et al. (2020); Cima et al. (2017). A reduction from OWL 2 QL to Datalog for meta-querying was proposed in Cima et al. (2017). In this paper, we experiment with various logic programming tools that support Datalog querying to determine their suitability as back-ends to MSER query answering. These tools stem from different logic programming paradigms (Prolog, pure Datalog, Answer Set Programming, Hybrid Knowledge Bases). Our work shows that the Datalog approach to MSER querying is practical also for sizeable ontologies with limited resources (time and memory). This paper significantly extends Qureshi & Faber (2021) by a more detailed experimental analysis and more background. Under consideration in Theory and Practice of Logic Programming (TPLP).
- [47] arXiv:2402.03136 [ pdf , ps , other ]
-
Title: Mastering Zero-Shot Interactions in Cooperative and Competitive Simultaneous GamesSubjects: Artificial Intelligence (cs.AI) ; Computer Science and Game Theory (cs.GT)
Abstract: The combination of self-play and planning has achieved great successes in sequential games, for instance in Chess and Go. However, adapting algorithms such as AlphaZero to simultaneous games poses a new challenge. In these games, missing information about concurrent actions of other agents is a limiting factor as they may select different Nash equilibria or do not play optimally at all. Thus, it is vital to model the behavior of the other agents when interacting with them in simultaneous games. To this end, we propose Albatross: AlphaZero for Learning Bounded-rational Agents and Temperature-based Response Optimization using Simulated Self-play. Albatross learns to play the novel equilibrium concept of a Smooth Best Response Logit Equilibrium (SBRLE), which enables cooperation and competition with agents of any playing strength. We perform an extensive evaluation of Albatross on a set of cooperative and competitive simultaneous perfect-information games. In contrast to AlphaZero, Albatross is able to exploit weak agents in the competitive game of Battlesnake. Additionally, it yields an improvement of 37.6% compared to previous state of the art in the cooperative Overcooked benchmark.
- [48] arXiv:2402.03164 [ pdf , ps , html , other ]
-
Title: Decidable Reasoning About Time in Finite-Domain Situation Calculus TheoriesSubjects: Artificial Intelligence (cs.AI) ; Logic in Computer Science (cs.LO)
Abstract: Representing time is crucial for cyber-physical systems and has been studied extensively in the Situation Calculus. The most commonly used approach represents time by adding a real-valued fluent $\mathit{time}(a)$ that attaches a time point to each action and consequently to each situation. We show that in this approach, checking whether there is a reachable situation that satisfies a given formula is undecidable, even if the domain of discourse is restricted to a finite set of objects. We present an alternative approach based on well-established results from timed automata theory by introducing clocks as real-valued fluents with restricted successor state axioms and comparison operators. %that only allow comparisons against fixed rationals. With this restriction, we can show that the reachability problem for finite-domain basic action theories is decidable. Finally, we apply our results on Golog program realization by presenting a decidable procedure for determining an action sequence that is a successful execution of a given program.
- [49] arXiv:2402.03181 [ pdf , ps , html , other ]
-
Title: C-RAG: Certified Generation Risks for Retrieval-Augmented Language ModelsSubjects: Artificial Intelligence (cs.AI) ; Computation and Language (cs.CL); Information Retrieval (cs.IR)
Abstract: Despite the impressive capabilities of large language models (LLMs) across diverse applications, they still suffer from trustworthiness issues, such as hallucinations and misalignments. Retrieval-augmented language models (RAG) have been proposed to enhance the credibility of generations by grounding external knowledge, but the theoretical understandings of their generation risks remains unexplored. In this paper, we answer: 1) whether RAG can indeed lead to low generation risks, 2) how to provide provable guarantees on the generation risks of RAG and vanilla LLMs, and 3) what sufficient conditions enable RAG models to reduce generation risks. We propose C-RAG, the first framework to certify generation risks for RAG models. Specifically, we provide conformal risk analysis for RAG models and certify an upper confidence bound of generation risks, which we refer to as conformal generation risk. We also provide theoretical guarantees on conformal generation risks for general bounded risk functions under test distribution shifts. We prove that RAG achieves a lower conformal generation risk than that of a single LLM when the quality of the retrieval model and transformer is non-trivial. Our intensive empirical results demonstrate the soundness and tightness of our conformal generation risk guarantees across four widely-used NLP datasets on four state-of-the-art retrieval models.
- [50] arXiv:2402.03310 [ pdf , ps , other ]
-
Title: V-IRL: Grounding Virtual Intelligence in Real LifeComments: Project page: this https URLSubjects: Artificial Intelligence (cs.AI) ; Computer Vision and Pattern Recognition (cs.CV)
Abstract: There is a sensory gulf between the Earth that humans inhabit and the digital realms in which modern AI agents are created. To develop AI agents that can sense, think, and act as flexibly as humans in real-world settings, it is imperative to bridge the realism gap between the digital and physical worlds. How can we embody agents in an environment as rich and diverse as the one we inhabit, without the constraints imposed by real hardware and control? Towards this end, we introduce V-IRL: a platform that enables agents to scalably interact with the real world in a virtual yet realistic environment. Our platform serves as a playground for developing agents that can accomplish various practical tasks and as a vast testbed for measuring progress in capabilities spanning perception, decision-making, and interaction with real-world data across the entire globe.
- [51] arXiv:2402.03375 [ pdf , ps , html , other ]
-
Title: BetterV: Controlled Verilog Generation with Discriminative GuidanceComments: Accepted by ICML 2024Subjects: Artificial Intelligence (cs.AI) ; Programming Languages (cs.PL)
Abstract: Due to the growing complexity of modern Integrated Circuits (ICs), there is a need for automated circuit design methods. Recent years have seen rising research in hardware design language generation to facilitate the design process. In this work, we propose a Verilog generation framework, BetterV, which fine-tunes the large language models (LLMs) on processed domain-specific datasets and incorporates generative discriminators for guidance on particular design demands. The Verilog modules are collected, filtered and processed from internet to form a clean and abundant dataset. Instruct-tuning methods are specially designed to fine-tune the LLMs to understand the knowledge about Verilog. Furthermore, data are augmented to enrich the training set and also used to train a generative discriminator on particular downstream task, which leads a guidance for the LLMs to optimize the Verilog implementation. BetterV has the ability to generate syntactically and functionally correct Verilog, which can outperform GPT-4 on the VerilogEval benchmark. With the help of task-specific generative discriminator, BetterV can achieve remarkable improvement on various electronic design automation (EDA) downstream tasks, including the netlist node reduction for synthesis and verification runtime reduction with Boolean Satisfiability (SAT) solving.
- [52] arXiv:2402.03388 [ pdf , ps , html , other ]
-
Title: Delivery Optimized Discovery in Behavioral User Segmentation under Budget ConstraintHarshita Chopra , Atanu R. Sinha , Sunav Choudhary , Ryan A. Rossi , Paavan Kumar Indela , Veda Pranav Parwatala , Srinjayee Paul , Aurghya MaitiSubjects: Artificial Intelligence (cs.AI) ; Information Retrieval (cs.IR); Machine Learning (cs.LG)
Abstract: Users' behavioral footprints online enable firms to discover behavior-based user segments (or, segments) and deliver segment specific messages to users. Following the discovery of segments, delivery of messages to users through preferred media channels like Facebook and Google can be challenging, as only a portion of users in a behavior segment find match in a medium, and only a fraction of those matched actually see the message (exposure). Even high quality discovery becomes futile when delivery fails. Many sophisticated algorithms exist for discovering behavioral segments; however, these ignore the delivery component. The problem is compounded because (i) the discovery is performed on the behavior data space in firms' data (e.g., user clicks), while the delivery is predicated on the static data space (e.g., geo, age) as defined by media; and (ii) firms work under budget constraint. We introduce a stochastic optimization based algorithm for delivery optimized discovery of behavioral user segmentation and offer new metrics to address the joint optimization. We leverage optimization under a budget constraint for delivery combined with a learning-based component for discovery. Extensive experiments on a public dataset from Google and a proprietary dataset show the effectiveness of our approach by simultaneously improving delivery metrics, reducing budget spend and achieving strong predictive performance in discovery.
- [53] arXiv:2402.03494 [ pdf , ps , html , other ]
-
Title: Beyond Text: Utilizing Vocal Cues to Improve Decision Making in LLMs for Robot Navigation TasksComments: 28 pages, 7 figuresSubjects: Artificial Intelligence (cs.AI) ; Robotics (cs.RO)
Abstract: While LLMs excel in processing text in these human conversations, they struggle with the nuances of verbal instructions in scenarios like social navigation, where ambiguity and uncertainty can erode trust in robotic and other AI systems. We can address this shortcoming by moving beyond text and additionally focusing on the paralinguistic features of these audio responses. These features are the aspects of spoken communication that do not involve the literal wording (lexical content) but convey meaning and nuance through how something is said. We present \emph{Beyond Text}; an approach that improves LLM decision-making by integrating audio transcription along with a subsection of these features, which focus on the affect and more relevant in human-robot conversations.This approach not only achieves a 70.26\% winning rate, outperforming existing LLMs by 22.16\% to 48.30\% (gemini-1.5-pro and gpt-3.5 respectively), but also enhances robustness against token manipulation adversarial attacks, highlighted by a 22.44\% less decrease ratio than the text-only language model in winning rate. ``\textit{Beyond Text}'' marks an advancement in social robot navigation and broader Human-Robot interactions, seamlessly integrating text-based guidance with human-audio-informed language models.
- [54] arXiv:2402.03507 [ pdf , ps , html , other ]
-
Title: Neural networks for abstraction and reasoning: Towards broad generalization in machinesComments: 32 pages main text, 17 pagesSubjects: Artificial Intelligence (cs.AI) ; Computation and Language (cs.CL); Machine Learning (cs.LG)
Abstract: For half a century, artificial intelligence research has attempted to reproduce the human qualities of abstraction and reasoning - creating computer systems that can learn new concepts from a minimal set of examples, in settings where humans find this easy. While specific neural networks are able to solve an impressive range of problems, broad generalisation to situations outside their training data has proved this http URL this work, we look at several novel approaches for solving the Abstraction & Reasoning Corpus (ARC), a dataset of abstract visual reasoning tasks introduced to test algorithms on broad generalization. Despite three international competitions with $100,000 in prizes, the best algorithms still fail to solve a majority of ARC tasks and rely on complex hand-crafted rules, without using machine learning at all. We revisit whether recent advances in neural networks allow progress on this task.
First, we adapt the DreamCoder neurosymbolic reasoning solver to ARC. DreamCoder automatically writes programs in a bespoke domain-specific language to perform reasoning, using a neural network to mimic human intuition. We present the Perceptual Abstraction and Reasoning Language (PeARL) language, which allows DreamCoder to solve ARC tasks, and propose a new recognition model that allows us to significantly improve on the previous best implementation.We also propose a new encoding and augmentation scheme that allows large language models (LLMs) to solve ARC tasks, and find that the largest models can solve some ARC tasks. LLMs are able to solve a different group of problems to state-of-the-art solvers, and provide an interesting way to complement other approaches. We perform an ensemble analysis, combining models to achieve better results than any system alone. Finally, we publish the arckit Python library to make future research on ARC easier. - [55] arXiv:2402.03539 [ pdf , ps , html , other ]
-
Title: Extended Version of: On the Structural Hardness of Answer Set Programming: Can Structure Efficiently Confine the Power of Disjunctions?Subjects: Artificial Intelligence (cs.AI) ; Logic in Computer Science (cs.LO)
Abstract: Answer Set Programming (ASP) is a generic problem modeling and solving framework with a strong focus on knowledge representation and a rapid growth of industrial applications. So far, the study of complexity resulted in characterizing hardness and determining their sources, fine-grained insights in the form of dichotomy-style results, as well as detailed parameterized complexity landscapes. Unfortunately, for the well-known parameter treewidth disjunctive programs require double-exponential runtime under reasonable complexity assumptions. This quickly becomes out of reach. We deal with the classification of structural parameters for disjunctive ASP on the program's rule structure (incidence graph).
First, we provide a polynomial kernel to obtain single-exponential runtime in terms of vertex cover size, despite subset-minimization being not represented in the program's structure. Then we turn our attention to strictly better structural parameters between vertex cover size and treewidth. Here, we provide double-exponential lower bounds for the most prominent parameters in that range: treedepth, feedback vertex size, and cliquewidth. Based on this, we argue that unfortunately our options beyond vertex cover size are limited. Our results provide an in-depth hardness study, relying on a novel reduction from normal to disjunctive programs, trading the increase of complexity for an exponential parameter compression. - [56] arXiv:2402.03575 [ pdf , ps , html , other ]
-
Title: Toward Human-AI Alignment in Large-Scale Multi-Player GamesSugandha Sharma , Guy Davidson , Khimya Khetarpal , Anssi Kanervisto , Udit Arora , Katja Hofmann , Ida MomennejadSubjects: Artificial Intelligence (cs.AI) ; Human-Computer Interaction (cs.HC)
Abstract: Achieving human-AI alignment in complex multi-agent games is crucial for creating trustworthy AI agents that enhance gameplay. We propose a method to evaluate this alignment using an interpretable task-sets framework, focusing on high-level behavioral tasks instead of low-level policies. Our approach has three components. First, we analyze extensive human gameplay data from Xbox's Bleeding Edge (100K+ games), uncovering behavioral patterns in a complex task space. This task space serves as a basis set for a behavior manifold capturing interpretable axes: fight-flight, explore-exploit, and solo-multi-agent. Second, we train an AI agent to play Bleeding Edge using a Generative Pretrained Causal Transformer and measure its behavior. Third, we project human and AI gameplay to the proposed behavior manifold to compare and contrast. This allows us to interpret differences in policy as higher-level behavioral concepts, e.g., we find that while human players exhibit variability in fight-flight and explore-exploit behavior, AI players tend towards uniformity. Furthermore, AI agents predominantly engage in solo play, while humans often engage in cooperative and competitive multi-agent patterns. These stark differences underscore the need for interpretable evaluation, design, and integration of AI in human-aligned applications. Our study advances the alignment discussion in AI and especially generative AI research, offering a measurable framework for interpretable human-agent alignment in multiplayer gaming.
- [57] arXiv:2402.03607 [ pdf , ps , html , other ]
-
Title: Improving Contextual Congruence Across Modalities for Effective Multimodal Marketing using Knowledge-infused LearningSubjects: Artificial Intelligence (cs.AI) ; Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV); Computers and Society (cs.CY); Human-Computer Interaction (cs.HC)
Abstract: The prevalence of smart devices with the ability to capture moments in multiple modalities has enabled users to experience multimodal information online. However, large Language (LLMs) and Vision models (LVMs) are still limited in capturing holistic meaning with cross-modal semantic relationships. Without explicit, common sense knowledge (e.g., as a knowledge graph), Visual Language Models (VLMs) only learn implicit representations by capturing high-level patterns in vast corpora, missing essential contextual cross-modal cues. In this work, we design a framework to couple explicit commonsense knowledge in the form of knowledge graphs with large VLMs to improve the performance of a downstream task, predicting the effectiveness of multi-modal marketing campaigns. While the marketing application provides a compelling metric for assessing our methods, our approach enables the early detection of likely persuasive multi-modal campaigns and the assessment and augmentation of marketing theory.
- [58] arXiv:2402.03618 [ pdf , ps , html , other ]
-
Title: Comparing Abstraction in Humans and Large Language Models Using Multimodal Serial ReproductionSreejan Kumar , Raja Marjieh , Byron Zhang , Declan Campbell , Michael Y. Hu , Umang Bhatt , Brenden Lake , Thomas L. GriffithsSubjects: Artificial Intelligence (cs.AI) ; Computation and Language (cs.CL); Neurons and Cognition (q-bio.NC)
Abstract: Humans extract useful abstractions of the world from noisy sensory data. Serial reproduction allows us to study how people construe the world through a paradigm similar to the game of telephone, where one person observes a stimulus and reproduces it for the next to form a chain of reproductions. Past serial reproduction experiments typically employ a single sensory modality, but humans often communicate abstractions of the world to each other through language. To investigate the effect language on the formation of abstractions, we implement a novel multimodal serial reproduction framework by asking people who receive a visual stimulus to reproduce it in a linguistic format, and vice versa. We ran unimodal and multimodal chains with both humans and GPT-4 and find that adding language as a modality has a larger effect on human reproductions than GPT-4's. This suggests human visual and linguistic representations are more dissociable than those of GPT-4.
- [59] arXiv:2402.03620 [ pdf , ps , html , other ]
-
Title: Self-Discover: Large Language Models Self-Compose Reasoning StructuresPei Zhou , Jay Pujara , Xiang Ren , Xinyun Chen , Heng-Tze Cheng , Quoc V. Le , Ed H. Chi , Denny Zhou , Swaroop Mishra , Huaixiu Steven ZhengComments: 17 pages, 11 figures, 5 tablesSubjects: Artificial Intelligence (cs.AI) ; Computation and Language (cs.CL)
Abstract: We introduce SELF-DISCOVER, a general framework for LLMs to self-discover the task-intrinsic reasoning structures to tackle complex reasoning problems that are challenging for typical prompting methods. Core to the framework is a self-discovery process where LLMs select multiple atomic reasoning modules such as critical thinking and step-by-step thinking, and compose them into an explicit reasoning structure for LLMs to follow during decoding. SELF-DISCOVER substantially improves GPT-4 and PaLM 2's performance on challenging reasoning benchmarks such as BigBench-Hard, grounded agent reasoning, and MATH, by as much as 32% compared to Chain of Thought (CoT). Furthermore, SELF-DISCOVER outperforms inference-intensive methods such as CoT-Self-Consistency by more than 20%, while requiring 10-40x fewer inference compute. Finally, we show that the self-discovered reasoning structures are universally applicable across model families: from PaLM 2-L to GPT-4, and from GPT-4 to Llama2, and share commonalities with human reasoning patterns.
- [60] arXiv:2402.03640 [ pdf , ps , html , other ]
-
Title: torchmSAT: A GPU-Accelerated Approximation To The Maximum Satisfiability ProblemSubjects: Artificial Intelligence (cs.AI)
Abstract: The remarkable achievements of machine learning techniques in analyzing discrete structures have drawn significant attention towards their integration into combinatorial optimization algorithms. Typically, these methodologies improve existing solvers by injecting learned models within the solving loop to enhance the efficiency of the search process. In this work, we derive a single differentiable function capable of approximating solutions for the Maximum Satisfiability Problem (MaxSAT). Then, we present a novel neural network architecture to model our differentiable function, and progressively solve MaxSAT using backpropagation. This approach eliminates the need for labeled data or a neural network training phase, as the training process functions as the solving algorithm. Additionally, we leverage the computational power of GPUs to accelerate these computations. Experimental results on challenging MaxSAT instances show that our proposed methodology outperforms two existing MaxSAT solvers, and is on par with another in terms of solution cost, without necessitating any training or access to an underlying SAT solver. Given that numerous NP-hard problems can be reduced to MaxSAT, our novel technique paves the way for a new generation of solvers poised to benefit from neural network GPU acceleration.
- [61] arXiv:2402.03678 [ pdf , ps , html , other ]
-
Title: Logical Specifications-guided Dynamic Task Sampling for Reinforcement Learning AgentsYash Shukla , Tanushree Burman , Abhishek Kulkarni , Robert Wright , Alvaro Velasquez , Jivko SinapovSubjects: Artificial Intelligence (cs.AI) ; Machine Learning (cs.LG); Robotics (cs.RO)
Abstract: Reinforcement Learning (RL) has made significant strides in enabling artificial agents to learn diverse behaviors. However, learning an effective policy often requires a large number of environment interactions. To mitigate sample complexity issues, recent approaches have used high-level task specifications, such as Linear Temporal Logic (LTL$_f$) formulas or Reward Machines (RM), to guide the learning progress of the agent. In this work, we propose a novel approach, called Logical Specifications-guided Dynamic Task Sampling (LSTS), that learns a set of RL policies to guide an agent from an initial state to a goal state based on a high-level task specification, while minimizing the number of environmental interactions. Unlike previous work, LSTS does not assume information about the environment dynamics or the Reward Machine, and dynamically samples promising tasks that lead to successful goal policies. We evaluate LSTS on a gridworld and show that it achieves improved time-to-threshold performance on complex sequential decision-making problems compared to state-of-the-art RM and Automaton-guided RL baselines, such as Q-Learning for Reward Machines and Compositional RL from logical Specifications (DIRL). Moreover, we demonstrate that our method outperforms RM and Automaton-guided RL baselines in terms of sample-efficiency, both in a partially observable robotic task and in a continuous control robotic manipulation task.
- [62] arXiv:2402.03728 [ pdf , ps , other ]
-
Title: Consistent Joint Decision-Making with Heterogeneous Learning ModelsComments: EACL 2024 Findings - Short PaperJournal-ref: EACL 2024Subjects: Artificial Intelligence (cs.AI) ; Computation and Language (cs.CL); Machine Learning (cs.LG); Logic in Computer Science (cs.LO)
Abstract: This paper introduces a novel decision-making framework that promotes consistency among decisions made by diverse models while utilizing external knowledge. Leveraging the Integer Linear Programming (ILP) framework, we map predictions from various models into globally normalized and comparable values by incorporating information about decisions' prior probability, confidence (uncertainty), and the models' expected accuracy. Our empirical study demonstrates the superiority of our approach over conventional baselines on multiple datasets.
- [63] arXiv:2402.03732 [ pdf , ps , other ]
-
Title: Deep Outdated Fact Detection in Knowledge GraphsComments: 10 pages, 6 figuresJournal-ref: 2023 IEEE International Conference on Data Mining Workshops (ICDMW), December 1-4, 2023, Shanghai, ChinaSubjects: Artificial Intelligence (cs.AI) ; Computation and Language (cs.CL); Digital Libraries (cs.DL); Machine Learning (cs.LG)
Abstract: Knowledge graphs (KGs) have garnered significant attention for their vast potential across diverse domains. However, the issue of outdated facts poses a challenge to KGs, affecting their overall quality as real-world information evolves. Existing solutions for outdated fact detection often rely on manual recognition. In response, this paper presents DEAN (Deep outdatEd fAct detectioN), a novel deep learning-based framework designed to identify outdated facts within KGs. DEAN distinguishes itself by capturing implicit structural information among facts through comprehensive modeling of both entities and relations. To effectively uncover latent out-of-date information, DEAN employs a contrastive approach based on a pre-defined Relations-to-Nodes (R2N) graph, weighted by the number of entities. Experimental results demonstrate the effectiveness and superiority of DEAN over state-of-the-art baseline methods.
- [64] arXiv:2402.03755 [ pdf , ps , other ]
-
Title: QuantAgent: Seeking Holy Grail in Trading by Self-Improving Large Language ModelSubjects: Artificial Intelligence (cs.AI) ; Computational Finance (q-fin.CP)
Abstract: Autonomous agents based on Large Language Models (LLMs) that devise plans and tackle real-world challenges have gained prominence.However, tailoring these agents for specialized domains like quantitative investment remains a formidable task. The core challenge involves efficiently building and integrating a domain-specific knowledge base for the agent's learning process. This paper introduces a principled framework to address this challenge, comprising a two-layer this http URL the inner loop, the agent refines its responses by drawing from its knowledge base, while in the outer loop, these responses are tested in real-world scenarios to automatically enhance the knowledge base with new insights.We demonstrate that our approach enables the agent to progressively approximate optimal behavior with provable efficiency.Furthermore, we instantiate this framework through an autonomous agent for mining trading signals named QuantAgent. Empirical results showcase QuantAgent's capability in uncovering viable financial signals and enhancing the accuracy of financial forecasts.
- [65] arXiv:2402.03822 [ pdf , ps , html , other ]
-
Title: RevOrder: A Novel Method for Enhanced Arithmetic in Language ModelsSubjects: Artificial Intelligence (cs.AI) ; Computation and Language (cs.CL); Machine Learning (cs.LG)
Abstract: This paper presents RevOrder, a novel technique aimed at improving arithmetic operations in large language models (LLMs) by reversing the output digits in addition, subtraction, and n-digit by 1-digit (nD by 1D) multiplication tasks. Our method significantly reduces the Count of Sequential Intermediate Digits (CSID) to $\mathcal{O}(1)$, a new metric we introduce to assess equation complexity. Through comprehensive testing, RevOrder not only achieves perfect accuracy in basic arithmetic operations but also substantially boosts LLM performance in division tasks, particularly with large numbers where traditional models struggle. Implementation of RevOrder is cost-effective for both training and inference phases. Moreover, applying RevOrder to fine-tune the LLaMA2-7B model on the GSM8K math task results in a considerable improvement, reducing equation calculation errors by 46% and increasing overall scores from 41.6 to 44.4.
- [66] arXiv:2402.03824 [ pdf , ps , other ]
-
Title: A call for embodied AIComments: Submitted to ICML 2024 Position paper trackSubjects: Artificial Intelligence (cs.AI)
Abstract: We propose Embodied AI as the next fundamental step in the pursuit of Artificial General Intelligence, juxtaposing it against current AI advancements, particularly Large Language Models. We traverse the evolution of the embodiment concept across diverse fields - philosophy, psychology, neuroscience, and robotics - to highlight how EAI distinguishes itself from the classical paradigm of static learning. By broadening the scope of Embodied AI, we introduce a theoretical framework based on cognitive architectures, emphasizing perception, action, memory, and learning as essential components of an embodied agent. This framework is aligned with Friston's active inference principle, offering a comprehensive approach to EAI development. Despite the progress made in the field of AI, substantial challenges, such as the formulation of a novel AI learning theory and the innovation of advanced hardware, persist. Our discussion lays down a foundational guideline for future Embodied AI research. Highlighting the importance of creating Embodied AI agents capable of seamless communication, collaboration, and coexistence with humans and other intelligent entities within real-world environments, we aim to steer the AI community towards addressing the multifaceted challenges and seizing the opportunities that lie ahead in the quest for AGI.
- [67] arXiv:2402.03962 [ pdf , ps , html , other ]
-
Title: Position Paper: Against Spurious Sparks $-$ Dovelating Inflated AI ClaimsComments: 20 pages, 15 figures. Preliminary work. Under review by the International Conference on Machine Learning (ICML)Subjects: Artificial Intelligence (cs.AI) ; Computation and Language (cs.CL)
Abstract: Humans have a tendency to see 'human'-like qualities in objects around them. We name our cars, and talk to pets and even household appliances, as if they could understand us as other humans do. This behavior, called anthropomorphism, is also seeing traction in Machine Learning (ML), where human-like intelligence is claimed to be perceived in Large Language Models (LLMs). In this position paper, considering professional incentives, human biases, and general methodological setups, we discuss how the current search for Artificial General Intelligence (AGI) is a perfect storm for over-attributing human-like qualities to LLMs. In several experiments, we demonstrate that the discovery of human-interpretable patterns in latent spaces should not be a surprising outcome. Also in consideration of common AI portrayal in the media, we call for the academic community to exercise extra caution, and to be extra aware of principles of academic integrity, in interpreting and communicating about AI research outcomes.
- [68] arXiv:2402.04140 [ pdf , ps , other ]
-
Title: Advancing Legal Reasoning: The Integration of AI to Navigate Complexities and Biases in Global Jurisprudence with Semi-Automated Arbitration Processes (SAAPs)Subjects: Artificial Intelligence (cs.AI) ; Computers and Society (cs.CY); Human-Computer Interaction (cs.HC)
Abstract: This study consists of a novel approach toward the analysis of court judgments spanning five countries, including the United States, the United Kingdom, Rwanda, Sweden and Hong Kong. This study also explores the intersection of the latest advancements in artificial intelligence (AI) and legal analysis, emphasizing the role of AI (specifically generative AI) in identifying human biases and facilitating automated, valid, and coherent multisided argumentation of court judgments with the goal of ensuring consistent application of laws in and across various jurisdictions. By incorporating Advanced Language Models (ALMs) and a newly introduced human-AI collaborative framework, this paper seeks to analyze Grounded Theory-based research design with Advanced Language Models (ALMs) in the practice of law. SHIRLEY is the name of the AI-based application (built on top of OpenAI's GPT technology), focusing on detecting logical inconsistencies and biases across various legal decisions. SHIRLEY analysis is aggregated and is accompanied by a comparison-oriented AI-based application called SAM (also an ALM) to identify relative deviations in SHIRLEY bias detections. Further, a CRITIC is generated within semi-autonomous arbitration process via the ALM, SARA. A novel approach is introduced in the utilization of an AI arbitrator to critically evaluate biases and qualitative-in-nature nuances identified by the aforementioned AI applications (SAM in concert with SHIRLEY), based on the Hague Rules on Business and Human Rights Arbitration. This Semi-Automated Arbitration Process (SAAP) aims to uphold the integrity and fairness of legal judgments by ensuring a nuanced debate-resultant "understanding" through a hybrid system of AI and human-based collaborative analysis.
- [69] arXiv:2402.04154 [ pdf , ps , html , other ]
-
Title: Read to Play (R2-Play): Decision Transformer with Multimodal Game InstructionYonggang Jin , Ge Zhang , Hao Zhao , Tianyu Zheng , Jiawei Guo , Liuyu Xiang , Shawn Yue , Stephen W. Huang , Zhaofeng He , Jie FuSubjects: Artificial Intelligence (cs.AI) ; Machine Learning (cs.LG)
Abstract: Developing a generalist agent is a longstanding objective in artificial intelligence. Previous efforts utilizing extensive offline datasets from various tasks demonstrate remarkable performance in multitasking scenarios within Reinforcement Learning. However, these works encounter challenges in extending their capabilities to new tasks. Recent approaches integrate textual guidance or visual trajectory into decision networks to provide task-specific contextual cues, representing a promising direction. However, it is observed that relying solely on textual guidance or visual trajectory is insufficient for accurately conveying the contextual information of tasks. This paper explores enhanced forms of task guidance for agents, enabling them to comprehend gameplay instructions, thereby facilitating a "read-to-play" capability. Drawing inspiration from the success of multimodal instruction tuning in visual tasks, we treat the visual-based RL task as a long-horizon vision task and construct a set of multimodal game instructions to incorporate instruction tuning into a decision transformer. Experimental results demonstrate that incorporating multimodal game instructions significantly enhances the decision transformer's multitasking and generalization capabilities.
- [70] arXiv:2402.04203 [ pdf , ps , other ]
-
Title: Human-Like Geometric Abstraction in Large Pre-trained Neural NetworksSubjects: Artificial Intelligence (cs.AI) ; Neurons and Cognition (q-bio.NC)
Abstract: Humans possess a remarkable capacity to recognize and manipulate abstract structure, which is especially apparent in the domain of geometry. Recent research in cognitive science suggests neural networks do not share this capacity, concluding that human geometric abilities come from discrete symbolic structure in human mental representations. However, progress in artificial intelligence (AI) suggests that neural networks begin to demonstrate more human-like reasoning after scaling up standard architectures in both model size and amount of training data. In this study, we revisit empirical results in cognitive science on geometric visual processing and identify three key biases in geometric visual processing: a sensitivity towards complexity, regularity, and the perception of parts and relations. We test tasks from the literature that probe these biases in humans and find that large pre-trained neural network models used in AI demonstrate more human-like abstract geometric processing.
- [71] arXiv:2402.04210 [ pdf , ps , other ]
-
Title: "Task Success" is not Enough: Investigating the Use of Video-Language Models as Behavior Critics for Catching Undesirable Agent BehaviorsSubjects: Artificial Intelligence (cs.AI) ; Robotics (cs.RO)
Abstract: Large-scale generative models are shown to be useful for sampling meaningful candidate solutions, yet they often overlook task constraints and user preferences. Their full power is better harnessed when the models are coupled with external verifiers and the final solutions are derived iteratively or progressively according to the verification feedback. In the context of embodied AI, verification often solely involves assessing whether goal conditions specified in the instructions have been met. Nonetheless, for these agents to be seamlessly integrated into daily life, it is crucial to account for a broader range of constraints and preferences beyond bare task success (e.g., a robot should grasp bread with care to avoid significant deformations). However, given the unbounded scope of robot tasks, it is infeasible to construct scripted verifiers akin to those used for explicit-knowledge tasks like the game of Go and theorem proving. This begs the question: when no sound verifier is available, can we use large vision and language models (VLMs), which are approximately omniscient, as scalable Behavior Critics to catch undesirable robot behaviors in videos? To answer this, we first construct a benchmark that contains diverse cases of goal-reaching yet undesirable robot policies. Then, we comprehensively evaluate VLM critics to gain a deeper understanding of their strengths and failure modes. Based on the evaluation, we provide guidelines on how to effectively utilize VLM critiques and showcase a practical way to integrate the feedback into an iterative process of policy refinement. The dataset and codebase are released at: this https URL .
- [72] arXiv:2402.04232 [ pdf , ps , other ]
-
Title: Can Generative Agents Predict Emotion?Comments: 14 pages, 6 figuresSubjects: Artificial Intelligence (cs.AI) ; Computation and Language (cs.CL)
Abstract: Large Language Models (LLMs) have demonstrated a number of human-like abilities, however the empathic understanding and emotional state of LLMs is yet to be aligned to that of humans. In this work, we investigate how the emotional state of generative LLM agents evolves as they perceive new events, introducing a novel architecture in which new experiences are compared to past memories. Through this comparison, the agent gains the ability to understand new experiences in context, which according to the appraisal theory of emotion is vital in emotion creation. First, the agent perceives new experiences as time series text data. After perceiving each new input, the agent generates a summary of past relevant memories, referred to as the norm, and compares the new experience to this norm. Through this comparison we can analyse how the agent reacts to the new experience in context. The PANAS, a test of affect, is administered to the agent, capturing the emotional state of the agent after the perception of the new event. Finally, the new experience is then added to the agents memory to be used in the creation of future norms. By creating multiple experiences in natural language from emotionally charged situations, we test the proposed architecture on a wide range of scenarios. The mixed results suggests that introducing context can occasionally improve the emotional alignment of the agent, but further study and comparison with human evaluators is necessary. We hope that this paper is another step towards the alignment of generative agents.
- [73] arXiv:2402.04338 [ pdf , ps , other ]
-
Title: Logical recognition method for solving the problem of identification in the Internet of ThingsComments: I will rework and improve it and post it againSubjects: Artificial Intelligence (cs.AI)
Abstract: A new area of application of methods of algebra of logic and to valued logic, which has emerged recently, is the problem of recognizing a variety of objects and phenomena, medical or technical diagnostics, constructing modern machines, checking test problems, etc., which can be reduced to constructing an optimal extension of the logical function to the entire feature space. For example, in logical recognition systems, logical methods based on discrete analysis and propositional calculus based on it are used to build their own recognition algorithms. In the general case, the use of a logical recognition method provides for the presence of logical connections expressed by the optimal continuation of a k-valued function over the entire feature space, in which the variables are the logical features of the objects or phenomena being recognized. The goal of this work is to develop a logical method for object recognition consisting of a reference table with logical features and classes of non-intersecting objects, which are specified as vectors from a given feature space. The method consists of considering the reference table as a logical function that is not defined everywhere and constructing an optimal continuation of the logical function to the entire feature space, which determines the extension of classes to the entire space.
- [74] arXiv:2402.04370 [ pdf , ps , html , other ]
-
Title: Pedestrian crossing decisions can be explained by bounded optimal decision-making under noisy visual perceptionYueyang Wang , Aravinda Ramakrishnan Srinivasan , Jussi P.P. Jokinen , Antti Oulasvirta , Gustav MarkkulaSubjects: Artificial Intelligence (cs.AI)
Abstract: This paper presents a model of pedestrian crossing decisions, based on the theory of computational rationality. It is assumed that crossing decisions are boundedly optimal, with bounds on optimality arising from human cognitive limitations. While previous models of pedestrian behaviour have been either 'black-box' machine learning models or mechanistic models with explicit assumptions about cognitive factors, we combine both approaches. Specifically, we model mechanistically noisy human visual perception and assumed rewards in crossing, but we use reinforcement learning to learn bounded optimal behaviour policy. The model reproduces a larger number of known empirical phenomena than previous models, in particular: (1) the effect of the time to arrival of an approaching vehicle on whether the pedestrian accepts the gap, the effect of the vehicle's speed on both (2) gap acceptance and (3) pedestrian timing of crossing in front of yielding vehicles, and (4) the effect on this crossing timing of the stopping distance of the yielding vehicle. Notably, our findings suggest that behaviours previously framed as 'biases' in decision-making, such as speed-dependent gap acceptance, might instead be a product of rational adaptation to the constraints of visual perception. Our approach also permits fitting the parameters of cognitive constraints and rewards per individual, to better account for individual differences. To conclude, by leveraging both RL and mechanistic modelling, our model offers novel insights about pedestrian behaviour, and may provide a useful foundation for more accurate and scalable pedestrian models.
- [75] arXiv:2402.04382 [ pdf , ps , html , other ]
-
Title: Counterfactual Generation with Answer Set ProgrammingComments: 16 PagesSubjects: Artificial Intelligence (cs.AI)
Abstract: Machine learning models that automate decision-making are increasingly being used in consequential areas such as loan approvals, pretrial bail approval, hiring, and many more. Unfortunately, most of these models are black-boxes, i.e., they are unable to reveal how they reach these prediction decisions. A need for transparency demands justification for such predictions. An affected individual might also desire explanations to understand why a decision was made. Ethical and legal considerations may further require informing the individual of changes in the input attribute that could be made to produce a desirable outcome. This paper focuses on the latter problem of automatically generating counterfactual explanations. We propose a framework Counterfactual Generation with s(CASP) (CFGS) that utilizes answer set programming (ASP) and the s(CASP) goal-directed ASP system to automatically generate counterfactual explanations from rules generated by rule-based machine learning (RBML) algorithms. In our framework, we show how counterfactual explanations are computed and justified by imagining worlds where some or all factual assumptions are altered/changed. More importantly, we show how we can navigate between these worlds, namely, go from our original world/scenario where we obtain an undesired outcome to the imagined world/scenario where we obtain a desired/favourable outcome.
- [76] arXiv:2402.04464 [ pdf , ps , other ]
-
Title: Ten Hard Problems in Artificial Intelligence We Must Get RightComments: 75 + 19 pagesSubjects: Artificial Intelligence (cs.AI) ; Computers and Society (cs.CY)
Abstract: We explore the AI2050 "hard problems" that block the promise of AI and cause AI risks: (1) developing general capabilities of the systems; (2) assuring the performance of AI systems and their training processes; (3) aligning system goals with human goals; (4) enabling great applications of AI in real life; (5) addressing economic disruptions; (6) ensuring the participation of all; (7) at the same time ensuring socially responsible deployment; (8) addressing any geopolitical disruptions that AI causes; (9) promoting sound governance of the technology; and (10) managing the philosophical disruptions for humans living in the age of AI. For each problem, we outline the area, identify significant recent work, and suggest ways forward. [Note: this paper reviews literature through January 2023.]
- [77] arXiv:2402.04559 [ pdf , ps , other ]
-
Title: Can Large Language Model Agents Simulate Human Trust Behaviors?Chengxing Xie , Canyu Chen , Feiran Jia , Ziyu Ye , Kai Shu , Adel Bibi , Ziniu Hu , Philip Torr , Bernard Ghanem , Guohao LiComments: The first two authors contributed equally. Project website: this https URLSubjects: Artificial Intelligence (cs.AI) ; Computation and Language (cs.CL); Human-Computer Interaction (cs.HC)
Abstract: Large Language Model (LLM) agents have been increasingly adopted as simulation tools to model humans in applications such as social science. However, one fundamental question remains: can LLM agents really simulate human behaviors? In this paper, we focus on one of the most critical behaviors in human interactions, trust, and aim to investigate whether or not LLM agents can simulate human trust behaviors. We first find that LLM agents generally exhibit trust behaviors, referred to as agent trust, under the framework of Trust Games, which are widely recognized in behavioral economics. Then, we discover that LLM agents can have high behavioral alignment with humans regarding trust behaviors, particularly for GPT-4, indicating the feasibility to simulate human trust behaviors with LLM agents. In addition, we probe into the biases in agent trust and the differences in agent trust towards agents and humans. We also explore the intrinsic properties of agent trust under conditions including advanced reasoning strategies and external manipulations. We further offer important implications of our discoveries for various scenarios where trust is paramount. Our study provides new insights into the behaviors of LLM agents and the fundamental analogy between LLMs and humans.
- [78] arXiv:2402.04578 [ pdf , ps , other ]
-
Title: S-Agents: Self-organizing Agents in Open-ended EnvironmentsComments: Preview, 15 pages, 12 figureSubjects: Artificial Intelligence (cs.AI) ; Multiagent Systems (cs.MA)
Abstract: Leveraging large language models (LLMs), autonomous agents have significantly improved, gaining the ability to handle a variety of tasks. In open-ended settings, optimizing collaboration for efficiency and effectiveness demands flexible adjustments. Despite this, current research mainly emphasizes fixed, task-oriented workflows and overlooks agent-centric organizational structures. Drawing inspiration from human organizational behavior, we introduce a self-organizing agent system (S-Agents) with a "tree of agents" structure for dynamic workflow, an "hourglass agent architecture" for balancing information priorities, and a "non-obstructive collaboration" method to allow asynchronous task execution among agents. This structure can autonomously coordinate a group of agents, efficiently addressing the challenges of open and dynamic environments without human intervention. Our experiments demonstrate that S-Agents proficiently execute collaborative building tasks and resource collection in the Minecraft environment, validating their effectiveness.
- [79] arXiv:2402.04597 [ pdf , ps , other ]
-
Title: CMSA algorithm for solving the prioritized pairwise test data generation problem in software product linesComments: Preprint of the submitted version of the article in Journal of HeuristicsJournal-ref: J. Heuristics 27(1-2): 229-249 (2021)Subjects: Artificial Intelligence (cs.AI) ; Software Engineering (cs.SE)
Abstract: In Software Product Lines (SPLs) it may be difficult or even impossible to test all the products of the family because of the large number of valid feature combinations that may exist. Thus, we want to find a minimal subset of the product family that allows us to test all these possible combinations (pairwise). Furthermore, when testing a single product is a great effort, it is desirable to first test products composed of a set of priority features. This problem is called Prioritized Pairwise Test Data Generation Problem.
State-of-the-art algorithms based on Integer Linear Programming for this problema are faster enough for small and medium instances. However, there exists some real instances that are too large to be computed with these algorithms in a reasonable time because of the exponential growth of the number of candidate solutions. Also, these heuristics not always lead us to the best solutions. In this work we propose a new approach based on a hybrid metaheuristic algorithm called Construct, Merge, Solve & Adapt. We compare this matheuristic with four algorithms: a Hybrid algorithm based on Integer Linear Programming ((HILP), a Hybrid algorithm based on Integer Nonlinear Programming (HINLP), the Parallel Prioritized Genetic Solver (PPGS), and a greedy algorithm called prioritized-ICPL. The analysis reveals that CMSA results in statistically significantly better quality solutions in most instances and for most levels of weighted coverage, although it requires more execution time. - [80] arXiv:2402.04627 [ pdf , ps , other ]
-
Title: SPARQL Generation: an analysis on fine-tuning OpenLLaMA for Question Answering over a Life Science Knowledge GraphComments: To appear in Proceedings of SWAT4HCLS 2024: Semantic Web Tools and Applications for Healthcare and Life SciencesSubjects: Artificial Intelligence (cs.AI) ; Computation and Language (cs.CL); Databases (cs.DB); Information Retrieval (cs.IR)
Abstract: The recent success of Large Language Models (LLM) in a wide range of Natural Language Processing applications opens the path towards novel Question Answering Systems over Knowledge Graphs leveraging LLMs. However, one of the main obstacles preventing their implementation is the scarcity of training data for the task of translating questions into corresponding SPARQL queries, particularly in the case of domain-specific KGs. To overcome this challenge, in this study, we evaluate several strategies for fine-tuning the OpenLlama LLM for question answering over life science knowledge graphs. In particular, we propose an end-to-end data augmentation approach for extending a set of existing queries over a given knowledge graph towards a larger dataset of semantically enriched question-to-SPARQL query pairs, enabling fine-tuning even for datasets where these pairs are scarce. In this context, we also investigate the role of semantic "clues" in the queries, such as meaningful variable names and inline comments. Finally, we evaluate our approach over the real-world Bgee gene expression knowledge graph and we show that semantic clues can improve model performance by up to 33% compared to a baseline with random variable names and no comments included.
- [81] arXiv:2402.04792 [ pdf , ps , html , other ]
-
Title: Direct Language Model Alignment from Online AI FeedbackShangmin Guo , Biao Zhang , Tianlin Liu , Tianqi Liu , Misha Khalman , Felipe Llinares , Alexandre Rame , Thomas Mesnard , Yao Zhao , Bilal Piot , Johan Ferret , Mathieu BlondelComments: 18 pages, 9 figures, 4 tablesSubjects: Artificial Intelligence (cs.AI) ; Computation and Language (cs.CL); Human-Computer Interaction (cs.HC)
Abstract: Direct alignment from preferences (DAP) methods, such as DPO, have recently emerged as efficient alternatives to reinforcement learning from human feedback (RLHF), that do not require a separate reward model. However, the preference datasets used in DAP methods are usually collected ahead of training and never updated, thus the feedback is purely offline. Moreover, responses in these datasets are often sampled from a language model distinct from the one being aligned, and since the model evolves over training, the alignment phase is inevitably off-policy. In this study, we posit that online feedback is key and improves DAP methods. Our method, online AI feedback (OAIF), uses an LLM as annotator: on each training iteration, we sample two responses from the current model and prompt the LLM annotator to choose which one is preferred, thus providing online feedback. Despite its simplicity, we demonstrate via human evaluation in several tasks that OAIF outperforms both offline DAP and RLHF methods. We further show that the feedback leveraged in OAIF is easily controllable, via instruction prompts to the LLM annotator.
- [82] arXiv:2402.04832 [ pdf , ps , other ]
-
Title: Structured d-DNNF Is Not Closed Under NegationComments: 9 pages, 2 figuresSubjects: Artificial Intelligence (cs.AI) ; Logic in Computer Science (cs.LO)
Abstract: Both structured d-DNNF and SDD can be exponentially more succinct than OBDD. Moreover, SDD is essentially as tractable as OBDD. But this has left two important open questions. Firstly, does OBDD support more tractable transformations than structured d-DNNF? And secondly, is structured d-DNNF more succinct than SDD? In this paper, we answer both questions in the affirmative. For the first question we show that, unlike OBDD, structured d-DNNF does not support polytime negation, disjunction, or existential quantification operations. As a corollary, we deduce that there are functions with an equivalent polynomial-sized structured d-DNNF but with no such representation as an SDD, thus answering the second question. We also lift this second result to arithmetic circuits (AC) to show a succinctness gap between PSDD and the monotone AC analogue to structured d-DNNF.
- [83] arXiv:2402.04856 [ pdf , ps , other ]
-
Title: Explaining Learned Reward Functions with Counterfactual TrajectoriesSubjects: Artificial Intelligence (cs.AI) ; Machine Learning (cs.LG)
Abstract: Learning rewards from human behaviour or feedback is a promising approach to aligning AI systems with human values but fails to consistently extract correct reward functions. Interpretability tools could enable users to understand and evaluate possible flaws in learned reward functions. We propose Counterfactual Trajectory Explanations (CTEs) to interpret reward functions in reinforcement learning by contrasting an original with a counterfactual partial trajectory and the rewards they each receive. We derive six quality criteria for CTEs and propose a novel Monte-Carlo-based algorithm for generating CTEs that optimises these quality criteria. Finally, we measure how informative the generated explanations are to a proxy-human model by training it on CTEs. CTEs are demonstrably informative for the proxy-human model, increasing the similarity between its predictions and the reward function on unseen trajectories. Further, it learns to accurately judge differences in rewards between trajectories and generalises to out-of-distribution examples. Although CTEs do not lead to a perfect understanding of the reward, our method, and more generally the adaptation of XAI methods, are presented as a fruitful approach for interpreting learned reward functions.
- [84] arXiv:2402.04858 [ pdf , ps , other ]
-
Title: CodeIt: Self-Improving Language Models with Prioritized Hindsight ReplayNatasha Butt , Blazej Manczak , Auke Wiggers , Corrado Rainone , David Zhang , Michaël Defferrard , Taco CohenComments: 8 pages, 11 figuresSubjects: Artificial Intelligence (cs.AI) ; Computation and Language (cs.CL); Machine Learning (cs.LG)
Abstract: Large language models are increasingly solving tasks that are commonly believed to require human-level reasoning ability. However, these models still perform very poorly on benchmarks of general intelligence such as the Abstraction and Reasoning Corpus (ARC). In this paper, we approach ARC as a programming-by-examples problem, and introduce a novel and scalable method for language model self-improvement called Code Iteration (CodeIt). Our method iterates between 1) program sampling and hindsight relabeling, and 2) learning from prioritized experience replay. By relabeling the goal of an episode (i.e., the target program output given input) to the realized output produced by the sampled program, our method effectively deals with the extreme sparsity of rewards in program synthesis. Applying CodeIt to the ARC dataset, we demonstrate that prioritized hindsight replay, along with pre-training and data-augmentation, leads to successful inter-task generalization. CodeIt is the first neuro-symbolic approach that scales to the full ARC evaluation dataset. Our method solves 15% of ARC evaluation tasks, achieving state-of-the-art performance and outperforming existing neural and symbolic baselines.
- [85] arXiv:2402.04870 [ pdf , ps , other ]
-
Title: Embedding Knowledge Graphs in Degenerate Clifford AlgebrasSubjects: Artificial Intelligence (cs.AI) ; Machine Learning (cs.LG)
Abstract: Clifford algebras are a natural generalization of the real numbers, the complex numbers, and the quaternions. So far, solely Clifford algebras of the form $Cl_{p,q}$ (i.e., algebras without nilpotent base vectors) have been studied in the context of knowledge graph embeddings. We propose to consider nilpotent base vectors with a nilpotency index of two. In these spaces, denoted $Cl_{p,q,r}$, allows generalizing over approaches based on dual numbers (which cannot be modelled using $Cl_{p,q}$) and capturing patterns that emanate from the absence of higher-order interactions between real and complex parts of entity embeddings. We design two new models for the discovery of the parameters $p$, $q$, and $r$. The first model uses a greedy search to optimize $p$, $q$, and $r$. The second predicts $(p, q,r)$ based on an embedding of the input knowledge graph computed using neural networks. The results of our evaluation on seven benchmark datasets suggest that nilpotent vectors can help capture embeddings better. Our comparison against the state of the art suggests that our approach generalizes better than other approaches on all datasets w.r.t. the MRR it achieves on validation data. We also show that a greedy search suffices to discover values of $p$, $q$ and $r$ that are close to optimal.
- [86] arXiv:2402.04874 [ pdf , ps , html , other ]
-
Title: Choosing a Classical Planner with Graph Neural NetworksSubjects: Artificial Intelligence (cs.AI) ; Machine Learning (cs.LG)
Abstract: Online planner selection is the task of choosing a solver out of a predefined set for a given planning problem. As planning is computationally hard, the performance of solvers varies greatly on planning problems. Thus, the ability to predict their performance on a given problem is of great importance. While a variety of learning methods have been employed, for classical cost-optimal planning the prevailing approach uses Graph Neural Networks (GNNs). In this work, we continue the line of work on using GNNs for online planner selection. We perform a thorough investigation of the impact of the chosen GNN model, graph representation and node features, as well as prediction task. Going further, we propose using the graph representation obtained by a GNN as an input to the Extreme Gradient Boosting (XGBoost) model, resulting in a more resource-efficient yet accurate approach. We show the effectiveness of a variety of GNN-based online planner selection methods, opening up new exciting avenues for research on online planner selection.
- [87] arXiv:2402.04892 [ pdf , ps , other ]
-
Title: A Unified Framework for Probabilistic Verification of AI Systems via Weighted Model IntegrationSubjects: Artificial Intelligence (cs.AI) ; Machine Learning (cs.LG)
Abstract: The probabilistic formal verification (PFV) of AI systems is in its infancy. So far, approaches have been limited to ad-hoc algorithms for specific classes of models and/or properties.
We propose a unifying framework for the PFV of AI systems based onWeighted Model Integration (WMI), which allows to frame the problem in very general terms.
Crucially, this reduction enables the verification of many properties of interest, like fairness, robustness or monotonicity, over a wide range of machine learning models, without making strong distributional assumptions.
We support the generality of the approach by solving multiple verification tasks with a single, off-the-shelf WMI solver, then discuss the scalability challenges and research directions related to this promising framework. - [88] arXiv:2402.04898 [ pdf , ps , other ]
-
Title: The Strain of Success: A Predictive Model for Injury Risk Mitigation and Team Success in SoccerComments: 19 pages (16 main, 2 references, 1 appendix), 10 figures (9 main, 1 appendix). Accepted at the MIT Sloan Sports Analytics Conference 2024 Research Paper CompetitionSubjects: Artificial Intelligence (cs.AI) ; Machine Learning (cs.LG)
Abstract: In this paper, we present a novel sequential team selection model in soccer. Specifically, we model the stochastic process of player injury and unavailability using player-specific information learned from real-world soccer data. Monte-Carlo Tree Search is used to select teams for games that optimise long-term team performance across a soccer season by reasoning over player injury probability. We validate our approach compared to benchmark solutions for the 2018/19 English Premier League season. Our model achieves similar season expected points to the benchmark whilst reducing first-team injuries by ~13% and the money inefficiently spent on injured players by ~11% - demonstrating the potential to reduce costs and improve player welfare in real-world soccer teams.
- [89] arXiv:2402.04938 [ pdf , ps , other ]
-
Title: An approach to automated videogame beta testingJournal-ref: Entertainment Computing, Elsevier. 18. pp 79 to 92. (2017)Subjects: Artificial Intelligence (cs.AI)
Abstract: Videogames developed in the 1970s and 1980s were modest programs created in a couple of months by a single person, who played the roles of designer, artist and programmer. Since then, videogames have evolved to become a multi-million dollar industry. Today, AAA game development involves hundreds of people working together over several years. Management and engineering requirements have changed at the same pace. Although many of the processes have been adapted over time, this is not quite true for quality assurance tasks, which are still done mainly manually by human beta testers due to the specific peculiarities of videogames. This paper presents an approach to automate this beta testing.
- [90] arXiv:2402.04971 [ pdf , ps , other ]
-
Title: Multi-Sender Persuasion -- A Computational PerspectiveSubjects: Artificial Intelligence (cs.AI) ; Computer Science and Game Theory (cs.GT)
Abstract: We consider multiple senders with informational advantage signaling to convince a single self-interested actor towards certain actions. Generalizing the seminal Bayesian Persuasion framework, such settings are ubiquitous in computational economics, multi-agent learning, and machine learning with multiple objectives. The core solution concept here is the Nash equilibrium of senders' signaling policies. Theoretically, we prove that finding an equilibrium in general is PPAD-Hard; in fact, even computing a sender's best response is NP-Hard. Given these intrinsic difficulties, we turn to finding local Nash equilibria. We propose a novel differentiable neural network to approximate this game's non-linear and discontinuous utilities. Complementing this with the extra-gradient algorithm, we discover local equilibria that Pareto dominates full-revelation equilibria and those found by existing neural networks. Broadly, our theoretical and empirical contributions are of interest to a large class of economic problems.
- [91] arXiv:2402.05048 [ pdf , ps , other ]
-
Title: How VADER is your AI? Towards a definition of artificial intelligence systems appropriate for regulationLeonardo C. T. Bezerra , Alexander E. I. Brownlee , Luana Ferraz Alvarenga , Renan Cipriano Moioli , Thais Vasconcelos BatistaSubjects: Artificial Intelligence (cs.AI)
Abstract: Artificial intelligence (AI) has driven many information and communication technology (ICT) breakthroughs. Nonetheless, the scope of ICT systems has expanded far beyond AI since the Turing test proposal. Critically, recent AI regulation proposals adopt AI definitions affecting ICT techniques, approaches, and systems that are not AI. In some cases, even works from mathematics, statistics, and engineering would be affected. Worryingly, AI misdefinitions are observed from Western societies to the Global South. In this paper, we propose a framework to score how validated as appropriately-defined for regulation (VADER) an AI definition is. Our online, publicly-available VADER framework scores the coverage of premises that should underlie AI definitions for regulation, which aim to (i) reproduce principles observed in other successful technology regulations, and (ii) include all AI techniques and approaches while excluding non-AI works. Regarding the latter, our score is based on a dataset of representative AI, non-AI ICT, and non-ICT examples. We demonstrate our contribution by reviewing the AI regulation proposals of key players, namely the United States, United Kingdom, European Union, and Brazil. Importantly, none of the proposals assessed achieve the appropriateness score, ranging from a revision need to a concrete risk to ICT systems and works from other fields.
- [92] arXiv:2402.05070 [ pdf , ps , html , other ]
-
Title: A Roadmap to Pluralistic AlignmentTaylor Sorensen , Jared Moore , Jillian Fisher , Mitchell Gordon , Niloofar Mireshghallah , Christopher Michael Rytting , Andre Ye , Liwei Jiang , Ximing Lu , Nouha Dziri , Tim Althoff , Yejin ChoiSubjects: Artificial Intelligence (cs.AI) ; Computation and Language (cs.CL); Information Retrieval (cs.IR)
Abstract: With increased power and prevalence of AI systems, it is ever more critical that AI systems are designed to serve all, i.e., people with diverse values and perspectives. However, aligning models to serve pluralistic human values remains an open research question. In this piece, we propose a roadmap to pluralistic alignment, specifically using language models as a test bed. We identify and formalize three possible ways to define and operationalize pluralism in AI systems: 1) Overton pluralistic models that present a spectrum of reasonable responses; 2) Steerably pluralistic models that can steer to reflect certain perspectives; and 3) Distributionally pluralistic models that are well-calibrated to a given population in distribution. We also propose and formalize three possible classes of pluralistic benchmarks: 1) Multi-objective benchmarks, 2) Trade-off steerable benchmarks, which incentivize models to steer to arbitrary trade-offs, and 3) Jury-pluralistic benchmarks which explicitly model diverse human ratings. We use this framework to argue that current alignment techniques may be fundamentally limited for pluralistic AI; indeed, we highlight empirical evidence, both from our own experiments and from other work, that standard alignment procedures might reduce distributional pluralism in models, motivating the need for further research on pluralistic alignment.
- [93] arXiv:2402.05121 [ pdf , ps , html , other ]
-
Title: Large Language Model for Table Processing: A SurveySubjects: Artificial Intelligence (cs.AI) ; Computation and Language (cs.CL)
Abstract: Tables, typically two-dimensional and structured to store large amounts of data, are essential in daily activities like database queries, spreadsheet calculations, and generating reports from web tables. Automating these table-centric tasks with Large Language Models (LLMs) offers significant public benefits, garnering interest from academia and industry. This survey provides an extensive overview of table tasks, encompassing not only the traditional areas like table question answering (Table QA) and fact verification, but also newly emphasized aspects such as table manipulation and advanced table data analysis. Additionally, it goes beyond the early strategies of pre-training and fine-tuning small language models, to include recent paradigms in LLM usage. The focus here is particularly on instruction-tuning, prompting, and agent-based approaches within the realm of LLMs. Finally, we highlight several challenges, ranging from private deployment and efficient inference to the development of extensive benchmarks for table manipulation and advanced data analysis.
- [94] arXiv:2402.05135 [ pdf , ps , html , other ]
-
Title: CADReN: Contextual Anchor-Driven Relational Network for Controllable Cross-Graphs Node Importance EstimationComments: 8 pages, 6 figuresSubjects: Artificial Intelligence (cs.AI) ; Computation and Language (cs.CL); Information Retrieval (cs.IR)
Abstract: Node Importance Estimation (NIE) is crucial for integrating external information into Large Language Models through Retriever-Augmented Generation. Traditional methods, focusing on static, single-graph characteristics, lack adaptability to new graphs and user-specific requirements. CADReN, our proposed method, addresses these limitations by introducing a Contextual Anchor (CA) mechanism. This approach enables the network to assess node importance relative to the CA, considering both structural and semantic features within Knowledge Graphs (KGs). Extensive experiments show that CADReN achieves better performance in cross-graph NIE task, with zero-shot prediction ability. CADReN is also proven to match the performance of previous models on single-graph NIE task. Additionally, we introduce and opensource two new datasets, RIC200 and WK1K, specifically designed for cross-graph NIE research, providing a valuable resource for future developments in this domain.
- [95] arXiv:2402.05138 [ pdf , ps , html , other ]
-
Title: SceMQA: A Scientific College Entrance Level Multimodal Question Answering BenchmarkZhenwen Liang , Kehan Guo , Gang Liu , Taicheng Guo , Yujun Zhou , Tianyu Yang , Jiajun Jiao , Renjie Pi , Jipeng Zhang , Xiangliang ZhangComments: Work in progressSubjects: Artificial Intelligence (cs.AI) ; Computation and Language (cs.CL)
Abstract: The paper introduces SceMQA, a novel benchmark for scientific multimodal question answering at the college entrance level. It addresses a critical educational phase often overlooked in existing benchmarks, spanning high school to pre-college levels. SceMQA focuses on core science subjects including Mathematics, Physics, Chemistry, and Biology. It features a blend of multiple-choice and free-response formats, ensuring a comprehensive evaluation of AI models' abilities. Additionally, our benchmark provides specific knowledge points for each problem and detailed explanations for each answer. SceMQA also uniquely presents problems with identical contexts but varied questions to facilitate a more thorough and accurate assessment of reasoning capabilities. In the experiment, we evaluate both open-source and close-source state-of-the-art Multimodal Large Language Models (MLLMs), across various experimental settings. The results show that further research and development are needed in developing more capable MLLM, as highlighted by only 50% to 60% accuracy achieved by the strongest models. Our benchmark and analysis will be available at this https URL
- [96] arXiv:2402.05307 [ pdf , ps , html , other ]
-
Title: Three Pathways to Neurosymbolic Reinforcement Learning with Interpretable Model and Policy NetworksSubjects: Artificial Intelligence (cs.AI) ; Machine Learning (cs.LG)
Abstract: Neurosymbolic AI combines the interpretability, parsimony, and explicit reasoning of classical symbolic approaches with the statistical learning of data-driven neural approaches. Models and policies that are simultaneously differentiable and interpretable may be key enablers of this marriage. This paper demonstrates three pathways to implementing such models and policies in a real-world reinforcement learning setting. Specifically, we study a broad class of neural networks that build interpretable semantics directly into their architecture. We reveal and highlight both the potential and the essential difficulties of combining logic, simulation, and learning. One lesson is that learning benefits from continuity and differentiability, but classical logic is discrete and non-differentiable. The relaxation to real-valued, differentiable representations presents a trade-off; the more learnable, the less interpretable. Another lesson is that using logic in the context of a numerical simulation involves a non-trivial mapping from raw (e.g., real-valued time series) simulation data to logical predicates. Some open questions this note exposes include: What are the limits of rule-based controllers, and how learnable are they? Do the differentiable interpretable approaches discussed here scale to large, complex, uncertain systems? Can we truly achieve interpretability? We highlight these and other themes across the three approaches.
- [97] arXiv:2402.05346 [ pdf , ps , html , other ]
-
Title: KIX: A Metacognitive Generalization FrameworkSubjects: Artificial Intelligence (cs.AI) ; Machine Learning (cs.LG); Robotics (cs.RO)
Abstract: Humans and other animals aptly exhibit general intelligence behaviors in solving a variety of tasks with flexibility and ability to adapt to novel situations by reusing and applying high level knowledge acquired over time. But artificial agents are more of a specialist, lacking such generalist behaviors. Artificial agents will require understanding and exploiting critical structured knowledge representations. We present a metacognitive generalization framework, Knowledge-Interaction-eXecution (KIX), and argue that interactions with objects leveraging type space facilitate the learning of transferable interaction concepts and generalization. It is a natural way of integrating knowledge into reinforcement learning and promising to act as an enabler for autonomous and generalist behaviors in artificial intelligence systems.
- [98] arXiv:2402.05359 [ pdf , ps , html , other ]
-
Title: Prompting with Divide-and-Conquer Program Makes Large Language Models Discerning to Hallucination and DeceptionComments: PreprintSubjects: Artificial Intelligence (cs.AI) ; Computation and Language (cs.CL); Machine Learning (cs.LG)
Abstract: Foundation models, such as Large language Models (LLMs), have attracted significant amount of interest due to their large number of applications. Existing works show that appropriate prompt design, such as Chain-of-Thoughts, can unlock LLM's powerful capacity in diverse areas. However, when handling tasks involving repetitive sub-tasks and/or deceptive contents, such as arithmetic calculation and article-level fake news detection, existing prompting strategies either suffers from insufficient expressive power or intermediate errors triggered by hallucination. To make LLM more discerning to such intermediate errors, we propose to guide LLM with a Divide-and-Conquer program that simultaneously ensures superior expressive power and disentangles task decomposition, sub-task resolution, and resolution assembly process. Theoretic analysis reveals that our strategy can guide LLM to extend the expressive power of fixed-depth Transformer. Experiments indicate that our proposed method can achieve better performance than typical prompting strategies in tasks bothered by intermediate errors and deceptive contents, such as large integer multiplication, hallucination detection and misinformation detection.
- [99] arXiv:2402.05391 [ pdf , ps , html , other ]
-
Title: Knowledge Graphs Meet Multi-Modal Learning: A Comprehensive SurveyZhuo Chen , Yichi Zhang , Yin Fang , Yuxia Geng , Lingbing Guo , Xiang Chen , Qian Li , Wen Zhang , Jiaoyan Chen , Yushan Zhu , Jiaqi Li , Xiaoze Liu , Jeff Z. Pan , Ningyu Zhang , Huajun ChenComments: Ongoing work; 41 pages (Main Text), 55 pages (Total), 11 Tables, 13 Figures, 619 citations; Paper list is available at this https URLSubjects: Artificial Intelligence (cs.AI) ; Computer Vision and Pattern Recognition (cs.CV); Information Retrieval (cs.IR); Machine Learning (cs.LG)
Abstract: Knowledge Graphs (KGs) play a pivotal role in advancing various AI applications, with the semantic web community's exploration into multi-modal dimensions unlocking new avenues for innovation. In this survey, we carefully review over 300 articles, focusing on KG-aware research in two principal aspects: KG-driven Multi-Modal (KG4MM) learning, where KGs support multi-modal tasks, and Multi-Modal Knowledge Graph (MM4KG), which extends KG studies into the MMKG realm. We begin by defining KGs and MMKGs, then explore their construction progress. Our review includes two primary task categories: KG-aware multi-modal learning tasks, such as Image Classification and Visual Question Answering, and intrinsic MMKG tasks like Multi-modal Knowledge Graph Completion and Entity Alignment, highlighting specific research trajectories. For most of these tasks, we provide definitions, evaluation benchmarks, and additionally outline essential insights for conducting relevant research. Finally, we discuss current challenges and identify emerging trends, such as progress in Large Language Modeling and Multi-modal Pre-training strategies. This survey aims to serve as a comprehensive reference for researchers already involved in or considering delving into KG and multi-modal learning research, offering insights into the evolving landscape of MMKG research and supporting future work.
- [100] arXiv:2402.05467 [ pdf , ps , other ]
-
Title: Rapid Optimization for Jailbreaking LLMs via Subconscious Exploitation and EchopraxiaGuangyu Shen , Siyuan Cheng , Kaiyuan Zhang , Guanhong Tao , Shengwei An , Lu Yan , Zhuo Zhang , Shiqing Ma , Xiangyu ZhangSubjects: Artificial Intelligence (cs.AI) ; Computation and Language (cs.CL); Cryptography and Security (cs.CR)
Abstract: Large Language Models (LLMs) have become prevalent across diverse sectors, transforming human life with their extraordinary reasoning and comprehension abilities. As they find increased use in sensitive tasks, safety concerns have gained widespread attention. Extensive efforts have been dedicated to aligning LLMs with human moral principles to ensure their safe deployment. Despite their potential, recent research indicates aligned LLMs are prone to specialized jailbreaking prompts that bypass safety measures to elicit violent and harmful content. The intrinsic discrete nature and substantial scale of contemporary LLMs pose significant challenges in automatically generating diverse, efficient, and potent jailbreaking prompts, representing a continuous obstacle. In this paper, we introduce RIPPLE (Rapid Optimization via Subconscious Exploitation and Echopraxia), a novel optimization-based method inspired by two psychological concepts: subconsciousness and echopraxia, which describe the processes of the mind that occur without conscious awareness and the involuntary mimicry of actions, respectively. Evaluations across 6 open-source LLMs and 4 commercial LLM APIs show RIPPLE achieves an average Attack Success Rate of 91.5\%, outperforming five current methods by up to 47.0\% with an 8x reduction in overhead. Furthermore, it displays significant transferability and stealth, successfully evading established detection mechanisms. The code of our work is available at \url{ this https URL }
- [101] arXiv:2402.05605 [ pdf , ps , other ]
-
Title: Optimizing Delegation in Collaborative Human-AI Hybrid TeamsSubjects: Artificial Intelligence (cs.AI) ; Human-Computer Interaction (cs.HC); Machine Learning (cs.LG)
Abstract: When humans and autonomous systems operate together as what we refer to as a hybrid team, we of course wish to ensure the team operates successfully and effectively. We refer to team members as agents. In our proposed framework, we address the case of hybrid teams in which, at any time, only one team member (the control agent) is authorized to act as control for the team. To determine the best selection of a control agent, we propose the addition of an AI manager (via Reinforcement Learning) which learns as an outside observer of the team. The manager learns a model of behavior linking observations of agent performance and the environment/world the team is operating in, and from these observations makes the most desirable selection of a control agent. We restrict the manager task by introducing a set of constraints. The manager constraints indicate acceptable team operation, so a violation occurs if the team enters a condition which is unacceptable and requires manager intervention. To ensure minimal added complexity or potential inefficiency for the team, the manager should attempt to minimize the number of times the team reaches a constraint violation and requires subsequent manager intervention. Therefore our manager is optimizing its selection of authorized agents to boost overall team performance while minimizing the frequency of manager intervention. We demonstrate our manager performance in a simulated driving scenario representing the case of a hybrid team of agents composed of a human driver and autonomous driving system. We perform experiments for our driving scenario with interfering vehicles, indicating the need for collision avoidance and proper speed control. Our results indicate a positive impact of our manager, with some cases resulting in increased team performance up to ~187% that of the best solo agent performance.
- [102] arXiv:2402.05786 [ pdf , ps , other ]
-
Title: Prompting Fairness: Artificial Intelligence as Game PlayersComments: preprintSubjects: Artificial Intelligence (cs.AI) ; Computer Science and Game Theory (cs.GT)
Abstract: Utilitarian games such as dictator games to measure fairness have been studied in the social sciences for decades. These games have given us insight into not only how humans view fairness but also in what conditions the frequency of fairness, altruism and greed increase or decrease. While these games have traditionally been focused on humans, the rise of AI gives us the ability to study how these models play these games. AI is becoming a constant in human interaction and examining how these models portray fairness in game play can give us some insight into how AI makes decisions. Over 101 rounds of the dictator game, I conclude that AI has a strong sense of fairness that is dependant of it it deems the person it is playing with as trustworthy, framing has a strong effect on how much AI gives a recipient when designated the trustee, and there may be evidence that AI experiences inequality aversion just as humans.
- [103] arXiv:2402.05808 [ pdf , ps , html , other ]
-
Title: Training Large Language Models for Reasoning through Reverse Curriculum Reinforcement LearningZhiheng Xi , Wenxiang Chen , Boyang Hong , Senjie Jin , Rui Zheng , Wei He , Yiwen Ding , Shichun Liu , Xin Guo , Junzhe Wang , Honglin Guo , Wei Shen , Xiaoran Fan , Yuhao Zhou , Shihan Dou , Xiao Wang , Xinbo Zhang , Peng Sun , Tao Gui , Qi Zhang , Xuanjing HuangComments: Preprint. Codes released: this https URLSubjects: Artificial Intelligence (cs.AI) ; Computation and Language (cs.CL); Machine Learning (cs.LG)
Abstract: In this paper, we propose R$^3$: Learning Reasoning through Reverse Curriculum Reinforcement Learning (RL), a novel method that employs only outcome supervision to achieve the benefits of process supervision for large language models. The core challenge in applying RL to complex reasoning is to identify a sequence of actions that result in positive rewards and provide appropriate supervision for optimization. Outcome supervision provides sparse rewards for final results without identifying error locations, whereas process supervision offers step-wise rewards but requires extensive manual annotation. R$^3$ overcomes these limitations by learning from correct demonstrations. Specifically, R$^3$ progressively slides the start state of reasoning from a demonstration's end to its beginning, facilitating easier model exploration at all stages. Thus, R$^3$ establishes a step-wise curriculum, allowing outcome supervision to offer step-level signals and precisely pinpoint errors. Using Llama2-7B, our method surpasses RL baseline on eight reasoning tasks by $4.1$ points on average. Notebaly, in program-based reasoning on GSM8K, it exceeds the baseline by $4.2$ points across three backbone models, and without any extra data, Codellama-7B + R$^3$ performs comparable to larger models or closed-source models.
- [104] arXiv:2402.05829 [ pdf , ps , other ]
-
Title: Limitations of Agents Simulated by Predictive ModelsSubjects: Artificial Intelligence (cs.AI)
Abstract: There is increasing focus on adapting predictive models into agent-like systems, most notably AI assistants based on language models. We outline two structural reasons for why these models can fail when turned into agents. First, we discuss auto-suggestive delusions. Prior work has shown theoretically that models fail to imitate agents that generated the training data if the agents relied on hidden observations: the hidden observations act as confounding variables, and the models treat actions they generate as evidence for nonexistent observations. Second, we introduce and formally study a related, novel limitation: predictor-policy incoherence. When a model generates a sequence of actions, the model's implicit prediction of the policy that generated those actions can serve as a confounding variable. The result is that models choose actions as if they expect future actions to be suboptimal, causing them to be overly conservative. We show that both of those failures are fixed by including a feedback loop from the environment, that is, re-training the models on their own actions. We give simple demonstrations of both limitations using Decision Transformers and confirm that empirical results agree with our conceptual and formal analysis. Our treatment provides a unifying view of those failure modes, and informs the question of why fine-tuning offline learned policies with online learning makes them more effective.
- [105] arXiv:2402.05863 [ pdf , ps , other ]
-
Title: How Well Can LLMs Negotiate? NegotiationArena Platform and AnalysisFederico Bianchi , Patrick John Chia , Mert Yuksekgonul , Jacopo Tagliabue , Dan Jurafsky , James ZouSubjects: Artificial Intelligence (cs.AI) ; Computation and Language (cs.CL); Computer Science and Game Theory (cs.GT)
Abstract: Negotiation is the basis of social interactions; humans negotiate everything from the price of cars to how to share common resources. With rapidly growing interest in using large language models (LLMs) to act as agents on behalf of human users, such LLM agents would also need to be able to negotiate. In this paper, we study how well LLMs can negotiate with each other. We develop NegotiationArena: a flexible framework for evaluating and probing the negotiation abilities of LLM agents. We implemented three types of scenarios in NegotiationArena to assess LLM's behaviors in allocating shared resources (ultimatum games), aggregate resources (trading games) and buy/sell goods (price negotiations). Each scenario allows for multiple turns of flexible dialogues between LLM agents to allow for more complex negotiations. Interestingly, LLM agents can significantly boost their negotiation outcomes by employing certain behavioral tactics. For example, by pretending to be desolate and desperate, LLMs can improve their payoffs by 20\% when negotiating against the standard GPT-4. We also quantify irrational negotiation behaviors exhibited by the LLM agents, many of which also appear in humans. Together, \NegotiationArena offers a new environment to investigate LLM interactions, enabling new insights into LLM's theory of mind, irrationality, and reasoning abilities.
- [106] arXiv:2402.05894 [ pdf , ps , html , other ]
-
Title: Large Language Model Meets Graph Neural Network in Knowledge DistillationComments: 17 pages, 6 figures, 4 tablesSubjects: Artificial Intelligence (cs.AI) ; Machine Learning (cs.LG)
Abstract: Despite recent community revelations about the advancements and potential applications of Large Language Models (LLMs) in understanding Text-Attributed Graph (TAG), the deployment of LLMs for production is hindered by its high computational and storage requirements, as well as long latencies during model inference. Simultaneously, although traditional Graph Neural Networks (GNNs) are light weight and adept at learning structural features of graphs, their ability to grasp the complex semantics in TAG is somewhat constrained for real applications. To address these limitations, we concentrate on the downstream task of node classification in TAG and propose a novel graph knowledge distillation framework, termed Linguistic Graph Knowledge Distillation (LinguGKD), using LLMs as teacher models and GNNs as student models for knowledge distillation. It involves TAG-oriented instruction tuning of LLM on designed tailored prompts, followed by propagating knowledge and aligning the hierarchically learned node features from the teacher LLM to the student GNN in latent space, employing a layer-adaptive contrastive learning strategy. Through extensive experiments on a variety of LLM and GNN models and multiple benchmark datasets, the proposed LinguGKD significantly boosts the student GNN's predictive accuracy and convergence rate, without the need of extra data or model parameters. Compared to teacher LLM, distilled GNN achieves superior inference speed equipped with much fewer computing and storage demands, when surpassing the teacher LLM's classification accuracy on some of benchmark datasets.
- [107] arXiv:2402.05929 [ pdf , ps , other ]
-
Title: An Interactive Agent Foundation ModelZane Durante , Bidipta Sarkar , Ran Gong , Rohan Taori , Yusuke Noda , Paul Tang , Ehsan Adeli , Shrinidhi Kowshika Lakshmikanth , Kevin Schulman , Arnold Milstein , Demetri Terzopoulos , Ade Famoti , Noboru Kuno , Ashley Llorens , Hoi Vo , Katsu Ikeuchi , Li Fei-Fei , Jianfeng Gao , Naoki Wake , Qiuyuan HuangSubjects: Artificial Intelligence (cs.AI) ; Machine Learning (cs.LG); Robotics (cs.RO)
Abstract: The development of artificial intelligence systems is transitioning from creating static, task-specific models to dynamic, agent-based systems capable of performing well in a wide range of applications. We propose an Interactive Agent Foundation Model that uses a novel multi-task agent training paradigm for training AI agents across a wide range of domains, datasets, and tasks. Our training paradigm unifies diverse pre-training strategies, including visual masked auto-encoders, language modeling, and next-action prediction, enabling a versatile and adaptable AI framework. We demonstrate the performance of our framework across three separate domains -- Robotics, Gaming AI, and Healthcare. Our model demonstrates its ability to generate meaningful and contextually relevant outputs in each area. The strength of our approach lies in its generality, leveraging a variety of data sources such as robotics sequences, gameplay data, large-scale video datasets, and textual information for effective multimodal and multi-task learning. Our approach provides a promising avenue for developing generalist, action-taking, multimodal systems.
- [108] arXiv:2402.06025 [ pdf , ps , html , other ]
-
Title: Doing Experiments and Revising Rules with Natural Language and Probabilistic ReasoningSubjects: Artificial Intelligence (cs.AI) ; Computation and Language (cs.CL)
Abstract: We build a computational model of how humans actively infer hidden rules by doing experiments. The basic principles behind the model is that, even if the rule is deterministic, the learner considers a broader space of fuzzy probabilistic rules, which it represents in natural language, and updates its hypotheses online after each experiment according to approximately Bayesian principles. In the same framework we also model experiment design according to information-theoretic criteria. We find that the combination of these three principles -- explicit hypotheses, probabilistic rules, and online updates -- can explain human performance on a Zendo-style task, and that removing any of these components leaves the model unable to account for the data.
- [109] arXiv:2402.06044 [ pdf , ps , other ]
-
Title: OpenToM: A Comprehensive Benchmark for Evaluating Theory-of-Mind Reasoning Capabilities of Large Language ModelsSubjects: Artificial Intelligence (cs.AI) ; Computation and Language (cs.CL)
Abstract: Neural Theory-of-Mind (N-ToM), machine's ability to understand and keep track of the mental states of others, is pivotal in developing socially intelligent agents. However, prevalent N-ToM benchmarks have several shortcomings, including the presence of ambiguous and artificial narratives, absence of personality traits and preferences, a lack of questions addressing characters' psychological mental states, and limited diversity in the questions posed. In response to these issues, we construct OpenToM, a new benchmark for assessing N-ToM with (1) longer and clearer narrative stories, (2) characters with explicit personality traits, (3) actions that are triggered by character intentions, and (4) questions designed to challenge LLMs' capabilities of modeling characters' mental states of both the physical and psychological world. Using OpenToM, we reveal that state-of-the-art LLMs thrive at modeling certain aspects of mental states in the physical world but fall short when tracking characters' mental states in the psychological world.
- [110] arXiv:2402.06049 [ pdf , ps , html , other ]
-
Title: Limits of Large Language Models in Debating HumansComments: 23 pages, 6 figures, 3 tables, 21 pages of supplemental materials, 8 supplemental figures, 6 supplemental tablesSubjects: Artificial Intelligence (cs.AI) ; Computation and Language (cs.CL); Human-Computer Interaction (cs.HC); Applications (stat.AP)
Abstract: Large Language Models (LLMs) have shown remarkable promise in their ability to interact proficiently with humans. Subsequently, their potential use as artificial confederates and surrogates in sociological experiments involving conversation is an exciting prospect. But how viable is this idea? This paper endeavors to test the limits of current-day LLMs with a pre-registered study integrating real people with LLM agents acting as people. The study focuses on debate-based opinion consensus formation in three environments: humans only, agents and humans, and agents only. Our goal is to understand how LLM agents influence humans, and how capable they are in debating like humans. We find that LLMs can blend in and facilitate human productivity but are less convincing in debate, with their behavior ultimately deviating from human's. We elucidate these primary failings and anticipate that LLMs must evolve further before being viable debaters.
- [111] arXiv:2402.06097 [ pdf , ps , html , other ]
-
Title: TWIG: Towards pre-hoc Hyperparameter Optimisation and Cross-Graph Generalisation via Simulated KGE ModelsComments: This article was accepted for publication at IEEE ICSC 2024Subjects: Artificial Intelligence (cs.AI) ; Machine Learning (cs.LG)
Abstract: In this paper we introduce TWIG (Topologically-Weighted Intelligence Generation), a novel, embedding-free paradigm for simulating the output of KGEs that uses a tiny fraction of the parameters. TWIG learns weights from inputs that consist of topological features of the graph data, with no coding for latent representations of entities or edges. Our experiments on the UMLS dataset show that a single TWIG neural network can predict the results of state-of-the-art ComplEx-N3 KGE model nearly exactly on across all hyperparameter configurations. To do this it uses a total of 2590 learnable parameters, but accurately predicts the results of 1215 different hyperparameter combinations with a combined cost of 29,322,000 parameters. Based on these results, we make two claims: 1) that KGEs do not learn latent semantics, but only latent representations of structural patterns; 2) that hyperparameter choice in KGEs is a deterministic function of the KGE model and graph structure. We further hypothesise that, as TWIG can simulate KGEs without embeddings, that node and edge embeddings are not needed to learn to accurately predict new facts in KGs. Finally, we formulate all of our findings under the umbrella of the ``Structural Generalisation Hypothesis", which suggests that ``twiggy" embedding-free / data-structure-based learning methods can allow a single neural network to simulate KGE performance, and perhaps solve the Link Prediction task, across many KGs from diverse domains and with different semantics.
- [112] arXiv:2402.06098 [ pdf , ps , html , other ]
-
Title: Veni, Vidi, Vici: Solving the Myriad of Challenges before Knowledge Graph LearningComments: This article was accepted for publication at IEEE ICSC 2024, and is being made available as an author preprint. As soon as it is published by IEEE, this registry will be updated in accordance with the IEEE copyright agreementSubjects: Artificial Intelligence (cs.AI) ; Machine Learning (cs.LG)
Abstract: Knowledge Graphs (KGs) have become increasingly common for representing large-scale linked data. However, their immense size has required graph learning systems to assist humans in analysis, interpretation, and pattern detection. While there have been promising results for researcher- and clinician- empowerment through a variety of KG learning systems, we identify four key deficiencies in state-of-the-art graph learning that simultaneously limit KG learning performance and diminish the ability of humans to interface optimally with these learning systems. These deficiencies are: 1) lack of expert knowledge integration, 2) instability to node degree extremity in the KG, 3) lack of consideration for uncertainty and relevance while learning, and 4) lack of explainability. Furthermore, we characterise state-of-the-art attempts to solve each of these problems and note that each attempt has largely been isolated from attempts to solve the other problems. Through a formalisation of these problems and a review of the literature that addresses them, we adopt the position that not only are deficiencies in these four key areas holding back human-KG empowerment, but that the divide-and-conquer approach to solving these problems as individual units rather than a whole is a significant barrier to the interface between humans and KG learning systems. We propose that it is only through integrated, holistic solutions to the limitations of KG learning systems that human and KG learning co-empowerment will be efficiently affected. We finally present our "Veni, Vidi, Vici" framework that sets a roadmap for effectively and efficiently shifting to a holistic co-empowerment model in both the KG learning and the broader machine learning domain.
- [113] arXiv:2402.06147 [ pdf , ps , html , other ]
-
Title: DeAL: Decoding-time Alignment for Large Language ModelsJames Y. Huang , Sailik Sengupta , Daniele Bonadiman , Yi-an Lai , Arshit Gupta , Nikolaos Pappas , Saab Mansour , Katrin Kirchhoff , Dan RothComments: The appendix contains data that is offensive / disturbing in natureSubjects: Artificial Intelligence (cs.AI) ; Computation and Language (cs.CL)
Abstract: Large Language Models (LLMs) are nowadays expected to generate content aligned with human preferences. Current work focuses on alignment at model training time, through techniques such as Reinforcement Learning with Human Feedback (RLHF). However, it is unclear if such methods are an effective choice to teach alignment objectives to the model. First, the inability to incorporate multiple, custom rewards and reliance on a model developer's view of universal and static principles are key limitations. Second, the residual gaps in model training and the reliability of such approaches are also questionable (e.g. susceptibility to jail-breaking even after safety training). To address these, we propose DeAL, a framework that allows the user to customize reward functions and enables Decoding-time Alignment of LLMs (DeAL). At its core, we view decoding as a heuristic-guided search process and facilitate the use of a wide variety of alignment objectives. Our experiments with programmatic constraints such as keyword and length constraints (studied widely in the pre-LLM era) and abstract objectives such as harmlessness and helpfulness (proposed in the post-LLM era) show that we can DeAL with fine-grained trade-offs, improve adherence to alignment objectives, and address residual gaps in LLMs. Lastly, while DeAL can be effectively paired with RLHF and prompting techniques, its generality makes decoding slower, an optimization we leave for future work.
- [114] arXiv:2402.06264 [ pdf , ps , other ]
-
Title: LLaVA-Docent: Instruction Tuning with Multimodal Large Language Model to Support Art Appreciation EducationUnggi Lee , Minji Jeon , Yunseo Lee , Gyuri Byun , Yoorim Son , Jaeyoon Shin , Hongkyu Ko , Hyeoncheol KimComments: 37 pages, 4 figures, 10 tablesSubjects: Artificial Intelligence (cs.AI) ; Computation and Language (cs.CL); Social and Information Networks (cs.SI)
Abstract: Art appreciation is vital in nurturing critical thinking and emotional intelligence among learners. However, traditional art appreciation education has often been hindered by limited access to art resources, especially for disadvantaged students, and an imbalanced emphasis on STEM subjects in mainstream education. In response to these challenges, recent technological advancements have paved the way for innovative solutions. This study explores the application of multi-modal large language models (MLLMs) in art appreciation education, focusing on developing LLaVA-Docent, a model that leverages these advancements. Our approach involved a comprehensive literature review and consultations with experts in the field, leading to developing a robust data framework. Utilizing this framework, we generated a virtual dialogue dataset that was leveraged by GPT-4. This dataset was instrumental in training the MLLM, named LLaVA-Docent. Six researchers conducted quantitative and qualitative evaluations of LLaVA-Docent to assess its effectiveness, benchmarking it against the GPT-4 model in a few-shot setting. The evaluation process revealed distinct strengths and weaknesses of the LLaVA-Docent model. Our findings highlight the efficacy of LLaVA-Docent in enhancing the accessibility and engagement of art appreciation education. By harnessing the potential of MLLMs, this study makes a significant contribution to the field of art education, proposing a novel methodology that reimagines the way art appreciation is taught and experienced.
- [115] arXiv:2402.06326 [ pdf , ps , html , other ]
-
Title: Prompt Learning on Temporal Interaction GraphsXi Chen , Siwei Zhang , Yun Xiong , Xixi Wu , Jiawei Zhang , Xiangguo Sun , Yao Zhang , Feng Zhao , Yulin KangComments: 11 pages, 8 figuresSubjects: Artificial Intelligence (cs.AI) ; Machine Learning (cs.LG); Social and Information Networks (cs.SI)
Abstract: Temporal Interaction Graphs (TIGs) are widely utilized to represent real-world systems. To facilitate representation learning on TIGs, researchers have proposed a series of TIG models. However, these models are still facing two tough gaps between the pre-training and downstream predictions in their ``pre-train, predict'' training paradigm. First, the temporal discrepancy between the pre-training and inference data severely undermines the models' applicability in distant future predictions on the dynamically evolving data. Second, the semantic divergence between pretext and downstream tasks hinders their practical applications, as they struggle to align with their learning and prediction capabilities across application scenarios.
Recently, the ``pre-train, prompt'' paradigm has emerged as a lightweight mechanism for model generalization. Applying this paradigm is a potential solution to solve the aforementioned challenges. However, the adaptation of this paradigm to TIGs is not straightforward. The application of prompting in static graph contexts falls short in temporal settings due to a lack of consideration for time-sensitive dynamics and a deficiency in expressive power. To address this issue, we introduce Temporal Interaction Graph Prompting (TIGPrompt), a versatile framework that seamlessly integrates with TIG models, bridging both the temporal and semantic gaps. In detail, we propose a temporal prompt generator to offer temporally-aware prompts for different tasks. These prompts stand out for their minimalistic design, relying solely on the tuning of the prompt generator with very little supervision data. To cater to varying computational resource demands, we propose an extended ``pre-train, prompt-based fine-tune'' paradigm, offering greater flexibility. Through extensive experiments, the TIGPrompt demonstrates the SOTA performance and remarkable efficiency advantages. - [116] arXiv:2402.06359 [ pdf , ps , other ]
-
Title: Modelling Human Values for AI ReasoningSubjects: Artificial Intelligence (cs.AI) ; Multiagent Systems (cs.MA)
Abstract: One of today's most significant societal challenges is building AI systems whose behaviour, or the behaviour it enables within communities of interacting agents (human and artificial), aligns with human values. To address this challenge, we detail a formal model of human values for their explicit computational representation. To our knowledge, this has not been attempted as yet, which is surprising given the growing volume of research integrating values within AI. Taking as our starting point the wealth of research investigating the nature of human values from social psychology over the last few decades, we set out to provide such a formal model. We show how this model can provide the foundational apparatus for AI-based reasoning over values, and demonstrate its applicability in real-world use cases. We illustrate how our model captures the key ideas from social psychology research and propose a roadmap for future integrated, and interdisciplinary, research into human values in AI. The ability to automatically reason over values not only helps address the value alignment problem but also facilitates the design of AI systems that can support individuals and communities in making more informed, value-aligned decisions. More and more, individuals and organisations are motivated to understand their values more explicitly and explore whether their behaviours and attitudes properly reflect them. Our work on modelling human values will enable AI systems to be designed and deployed to meet this growing need.
- [117] arXiv:2402.06389 [ pdf , ps , other ]
-
Title: Human Aesthetic Preference-Based Large Text-to-Image Model Personalization: Kandinsky Generation as an ExampleComments: 9 pages, 10 figuresSubjects: Artificial Intelligence (cs.AI) ; Human-Computer Interaction (cs.HC); Multimedia (cs.MM)
Abstract: With the advancement of neural generative capabilities, the art community has actively embraced GenAI (generative artificial intelligence) for creating painterly content. Large text-to-image models can quickly generate aesthetically pleasing outcomes. However, the process can be non-deterministic and often involves tedious trial-and-error, as users struggle with formulating effective prompts to achieve their desired results. This paper introduces a prompting-free generative approach that empowers users to automatically generate personalized painterly content that incorporates their aesthetic preferences in a customized artistic style. This approach involves utilizing ``semantic injection'' to customize an artist model in a specific artistic style, and further leveraging a genetic algorithm to optimize the prompt generation process through real-time iterative human feedback. By solely relying on the user's aesthetic evaluation and preference for the artist model-generated images, this approach creates the user a personalized model that encompasses their aesthetic preferences and the customized artistic style.
- [118] arXiv:2402.06487 [ pdf , ps , other ]
-
Title: Le Nozze di Giustizia. Interactions between Artificial Intelligence, Law, Logic, Language and Computation with some case studies in Traffic Regulations and Health CareSubjects: Artificial Intelligence (cs.AI) ; Computers and Society (cs.CY)
Abstract: An important aim of this paper is to convey some basics of mathematical logic to the legal community working with Artificial Intelligence. After analysing what AI is, we decide to delimit ourselves to rule-based AI leaving Neural Networks and Machine Learning aside. Rule based AI allows for Formal methods which are described in a rudimentary form. We will then see how mathematical logic interacts with legal rule-based AI practice. We shall see how mathematical logic imposes limitations and complications to AI applications. We classify the limitations and interactions between mathematical logic and legal AI in three categories: logical, computational and mathematical. The examples to showcase the interactions will largely come from European traffic regulations. The paper closes off with some reflections on how and where AI could be used and on basic mechanisms that shape society.
- [119] arXiv:2402.06500 [ pdf , ps , html , other ]
-
Title: On the Fly Detection of Root Causes from Observed Data with Application to IT SystemsSubjects: Artificial Intelligence (cs.AI)
Abstract: This paper introduces a new structural causal model tailored for representing threshold-based IT systems and presents a new algorithm designed to rapidly detect root causes of anomalies in such systems. When root causes are not causally related, the method is proven to be correct; while an extension is proposed based on the intervention of an agent to relax this assumption. Our algorithm and its agent-based extension leverage causal discovery from offline data and engage in subgraph traversal when encountering new anomalies in online data. Our extensive experiments demonstrate the superior performance of our methods, even when applied to data generated from alternative structural causal models or real IT monitoring data.
- [120] arXiv:2402.06503 [ pdf , ps , other ]
-
Title: ACTER: Diverse and Actionable Counterfactual Sequences for Explaining and Diagnosing RL PoliciesComments: 17 pages, 4 FiguresSubjects: Artificial Intelligence (cs.AI) ; Machine Learning (cs.LG)
Abstract: Understanding how failure occurs and how it can be prevented in reinforcement learning (RL) is necessary to enable debugging, maintain user trust, and develop personalized policies. Counterfactual reasoning has often been used to assign blame and understand failure by searching for the closest possible world in which the failure is avoided. However, current counterfactual state explanations in RL can only explain an outcome using just the current state features and offer no actionable recourse on how a negative outcome could have been prevented. In this work, we propose ACTER (Actionable Counterfactual Sequences for Explaining Reinforcement Learning Outcomes), an algorithm for generating counterfactual sequences that provides actionable advice on how failure can be avoided. ACTER investigates actions leading to a failure and uses the evolutionary algorithm NSGA-II to generate counterfactual sequences of actions that prevent it with minimal changes and high certainty even in stochastic environments. Additionally, ACTER generates a set of multiple diverse counterfactual sequences that enable users to correct failure in the way that best fits their preferences. We also introduce three diversity metrics that can be used for evaluating the diversity of counterfactual sequences. We evaluate ACTER in two RL environments, with both discrete and continuous actions, and show that it can generate actionable and diverse counterfactual sequences. We conduct a user study to explore how explanations generated by ACTER help users identify and correct failure.
- [121] arXiv:2402.06529 [ pdf , ps , html , other ]
-
Title: Introspective Planning: Guiding Language-Enabled Agents to Refine Their Own UncertaintyComments: 22 pages, 15 figuresSubjects: Artificial Intelligence (cs.AI) ; Computation and Language (cs.CL); Machine Learning (cs.LG)
Abstract: Large language models (LLMs) exhibit advanced reasoning skills, enabling robots to comprehend natural language instructions and strategically plan high-level actions through proper grounding. However, LLM hallucination may result in robots confidently executing plans that are misaligned with user goals or, in extreme cases, unsafe. Additionally, inherent ambiguity in natural language instructions can induce task uncertainty, particularly in situations where multiple valid options exist. To address this issue, LLMs must identify such uncertainty and proactively seek clarification. This paper explores the concept of introspective planning as a systematic method for guiding LLMs in forming uncertainty--aware plans for robotic task execution without the need for fine-tuning. We investigate uncertainty quantification in task-level robot planning and demonstrate that introspection significantly improves both success rates and safety compared to state-of-the-art LLM-based planning approaches. Furthermore, we assess the effectiveness of introspective planning in conjunction with conformal prediction, revealing that this combination yields tighter confidence bounds, thereby maintaining statistical success guarantees with fewer superfluous user clarification queries.
- [122] arXiv:2402.06557 [ pdf , ps , html , other ]
-
Title: The Quantified Boolean Bayesian Network: Theory and Experiments with a Logical Graphical ModelSubjects: Artificial Intelligence (cs.AI) ; Information Retrieval (cs.IR)
Abstract: This paper introduces the Quantified Boolean Bayesian Network (QBBN), which provides a unified view of logical and probabilistic reasoning. The QBBN is meant to address a central problem with the Large Language Model (LLM), which has become extremely popular in Information Retrieval, which is that the LLM hallucinates. A Bayesian Network, by construction, cannot hallucinate, because it can only return answers that it can explain. We show how a Bayesian Network over an unbounded number of boolean variables can be configured to represent the logical reasoning underlying human language. We do this by creating a key-value version of the First-Order Calculus, for which we can prove consistency and completeness. We show that the model is trivially trained over fully observed data, but that inference is non-trivial. Exact inference in a Bayesian Network is intractable (i.e. $\Omega(2^N)$ for $N$ variables). For inference, we investigate the use of Loopy Belief Propagation (LBP), which is not guaranteed to converge, but which has been shown to often converge in practice. Our experiments show that LBP indeed does converge very reliably, and our analysis shows that a round of LBP takes time $O(N2^n)$, where $N$ bounds the number of variables considered, and $n$ bounds the number of incoming connections to any factor, and further improvements may be possible. Our network is specifically designed to alternate between AND and OR gates in a Boolean Algebra, which connects more closely to logical reasoning, allowing a completeness proof for an expanded version of our network, and also allows inference to follow specific but adequate pathways, that turn out to be fast.
- [123] arXiv:2402.06590 [ pdf , ps , html , other ]
-
Title: Predictive representations: building blocks of intelligenceSubjects: Artificial Intelligence (cs.AI) ; Machine Learning (cs.LG)
Abstract: Adaptive behavior often requires predicting future events. The theory of reinforcement learning prescribes what kinds of predictive representations are useful and how to compute them. This paper integrates these theoretical ideas with work on cognition and neuroscience. We pay special attention to the successor representation (SR) and its generalizations, which have been widely applied both as engineering tools and models of brain function. This convergence suggests that particular kinds of predictive representations may function as versatile building blocks of intelligence.
- [124] arXiv:2402.06596 [ pdf , ps , other ]
-
Title: Understanding the Weakness of Large Language Model Agents within a Complex Android EnvironmentSubjects: Artificial Intelligence (cs.AI) ; Human-Computer Interaction (cs.HC); Software Engineering (cs.SE)
Abstract: Large language models (LLMs) have empowered intelligent agents to execute intricate tasks within domain-specific software such as browsers and games. However, when applied to general-purpose software systems like operating systems, LLM agents face three primary challenges. Firstly, the action space is vast and dynamic, posing difficulties for LLM agents to maintain an up-to-date understanding and deliver accurate responses. Secondly, real-world tasks often require inter-application cooperation}, demanding farsighted planning from LLM agents. Thirdly, agents need to identify optimal solutions aligning with user constraints, such as security concerns and preferences. These challenges motivate AndroidArena, an environment and benchmark designed to evaluate LLM agents on a modern operating system. To address high-cost of manpower, we design a scalable and semi-automated method to construct the benchmark. In the task evaluation, AndroidArena incorporates accurate and adaptive metrics to address the issue of non-unique solutions. Our findings reveal that even state-of-the-art LLM agents struggle in cross-APP scenarios and adhering to specific constraints. Additionally, we identify a lack of four key capabilities, i.e., understanding, reasoning, exploration, and reflection, as primary reasons for the failure of LLM agents. Furthermore, we provide empirical analysis on the failure of reflection, and improve the success rate by 27% with our proposed exploration strategy. This work is the first to present valuable insights in understanding fine-grained weakness of LLM agents, and offers a path forward for future research in this area. Environment, benchmark, and evaluation code for AndroidArena are released at this https URL .
- [125] arXiv:2402.06634 [ pdf , ps , html , other ]
-
Title: SocraSynth: Multi-LLM Reasoning with Conditional StatisticsComments: 1 figure, 6 tables, 6 appendicesSubjects: Artificial Intelligence (cs.AI) ; Computation and Language (cs.CL); Machine Learning (cs.LG)
Abstract: Large language models (LLMs), while promising, face criticisms for biases, hallucinations, and a lack of reasoning capability. This paper introduces SocraSynth, a multi-LLM agent reasoning platform developed to mitigate these issues. SocraSynth utilizes conditional statistics and systematic context enhancement through continuous arguments, alongside adjustable debate contentiousness levels. The platform typically involves a human moderator and two LLM agents representing opposing viewpoints on a given subject. SocraSynth operates in two main phases: knowledge generation and reasoning evaluation. In the knowledge generation phase, the moderator defines the debate topic and contentiousness level, prompting the agents to formulate supporting arguments for their respective stances. The reasoning evaluation phase then employs Socratic reasoning and formal logic principles to appraise the quality of the arguments presented. The dialogue concludes with the moderator adjusting the contentiousness from confrontational to collaborative, gathering final, conciliatory remarks to aid in human reasoning and decision-making. Through case studies in three distinct application domains, this paper showcases SocraSynth's effectiveness in fostering rigorous research, dynamic reasoning, comprehensive assessment, and enhanced collaboration. This underscores the value of multi-agent interactions in leveraging LLMs for advanced knowledge extraction and decision-making support.
- [126] arXiv:2402.06640 [ pdf , ps , other ]
-
Title: Modeling and Optimization of Epidemiological Control Policies Through Reinforcement LearningComments: 22 pages, 8 figuresJournal-ref: J. Emerging Investigators Article (2023) Vol. 6Subjects: Artificial Intelligence (cs.AI) ; Populations and Evolution (q-bio.PE)
Abstract: Pandemics involve the high transmission of a disease that impacts global and local health and economic patterns. The impact of a pandemic can be minimized by enforcing certain restrictions on a community. However, while minimizing infection and death rates, these restrictions can also lead to economic crises. Epidemiological models help propose pandemic control strategies based on non-pharmaceutical interventions such as social distancing, curfews, and lockdowns, reducing the economic impact of these restrictions. However, designing manual control strategies while considering disease spread and economic status is non-trivial. Optimal strategies can be designed through multi-objective reinforcement learning (MORL) models, which demonstrate how restrictions can be used to optimize the outcome of a pandemic. In this research, we utilized an epidemiological Susceptible, Exposed, Infected, Recovered, Deceased (SEIRD) model: a compartmental model for virtually simulating a pandemic day by day. We combined the SEIRD model with a deep double recurrent Q-network to train a reinforcement learning agent to enforce the optimal restriction on the SEIRD simulation based on a reward function. We tested two agents with unique reward functions and pandemic goals to obtain two strategies. The first agent placed long lockdowns to reduce the initial spread of the disease, followed by cyclical and shorter lockdowns to mitigate the resurgence of the disease. The second agent provided similar infection rates but an improved economy by implementing a 10-day lockdown and 20-day no-restriction cycle. This use of reinforcement learning and epidemiological modeling allowed for both economic and infection mitigation in multiple pandemic scenarios.
- [127] arXiv:2402.06647 [ pdf , ps , html , other ]
-
Title: A Survey on Large Language Model Hallucination via a Creativity PerspectiveComments: 9 pages, 4 figuresSubjects: Artificial Intelligence (cs.AI) ; Human-Computer Interaction (cs.HC)
Abstract: Hallucinations in large language models (LLMs) are always seen as limitations. However, could they also be a source of creativity? This survey explores this possibility, suggesting that hallucinations may contribute to LLM application by fostering creativity. This survey begins with a review of the taxonomy of hallucinations and their negative impact on LLM reliability in critical applications. Then, through historical examples and recent relevant theories, the survey explores the potential creative benefits of hallucinations in LLMs. To elucidate the value and evaluation criteria of this connection, we delve into the definitions and assessment methods of creativity. Following the framework of divergent and convergent thinking phases, the survey systematically reviews the literature on transforming and harnessing hallucinations for creativity in LLMs. Finally, the survey discusses future research directions, emphasizing the need to further explore and refine the application of hallucinations in creative processes within LLMs.
- [128] arXiv:2402.06654 [ pdf , ps , other ]
-
Title: Conversational Crowdsensing: A Parallel Intelligence Powered Novel Sensing ApproachZhengqiu Zhu , Yong Zhao , Bin Chen , Sihang Qiu , Kai Xu , Quanjun Yin , Jincai Huang , Zhong Liu , Fei-Yue WangSubjects: Artificial Intelligence (cs.AI) ; Human-Computer Interaction (cs.HC)
Abstract: The transition from CPS-based Industry 4.0 to CPSS-based Industry 5.0 brings new requirements and opportunities to current sensing approaches, especially in light of recent progress in Chatbots and Large Language Models (LLMs). Therefore, the advancement of parallel intelligence-powered Crowdsensing Intelligence (CSI) is witnessed, which is currently advancing towards linguistic intelligence. In this paper, we propose a novel sensing paradigm, namely conversational crowdsensing, for Industry 5.0. It can alleviate workload and professional requirements of individuals and promote the organization and operation of diverse workforce, thereby facilitating faster response and wider popularization of crowdsensing systems. Specifically, we design the architecture of conversational crowdsensing to effectively organize three types of participants (biological, robotic, and digital) from diverse communities. Through three levels of effective conversation (i.e., inter-human, human-AI, and inter-AI), complex interactions and service functionalities of different workers can be achieved to accomplish various tasks across three sensing phases (i.e., requesting, scheduling, and executing). Moreover, we explore the foundational technologies for realizing conversational crowdsensing, encompassing LLM-based multi-agent systems, scenarios engineering and conversational human-AI cooperation. Finally, we present potential industrial applications of conversational crowdsensing and discuss its implications. We envision that conversations in natural language will become the primary communication channel during crowdsensing process, enabling richer information exchange and cooperative problem-solving among humans, robots, and AI.
- [129] arXiv:2402.06660 [ pdf , ps , other ]
-
Title: The role of the metaverse in calibrating an embodied artificial general intelligenceComments: Presented at the conference second international conference on human-centred AI ethics: seeing the human in the artificial (HCAIE 2023): this https URLSubjects: Artificial Intelligence (cs.AI) ; Human-Computer Interaction (cs.HC)
Abstract: This paper examines the concept of embodied artificial general intelligence (AGI), its relationship to human consciousness, and the key role of the metaverse in facilitating this relationship. The paper leverages theoretical frameworks such as embodied cognition, Michael Levin's computational boundary of a "Self," Donald D. Hoffman's Interface Theory of Perception, and Bernardo Kastrup's analytical idealism to build the argument for achieving embodied AGI. It contends that our perceived outer reality is a symbolic representation of alternate inner states of being, and that AGI could embody a higher consciousness with a larger computational boundary. The paper further discusses the developmental stages of AGI, the requirements for the emergence of an embodied AGI, the importance of a calibrated symbolic interface for AGI, and the key role played by the metaverse, decentralized systems, open-source blockchain technology, as well as open-source AI research. It also explores the idea of a feedback loop between AGI and human users in metaverse spaces as a tool for AGI calibration, as well as the role of local homeostasis and decentralized governance as preconditions for achieving a stable embodied AGI. The paper concludes by emphasizing the importance of achieving a certain degree of harmony in human relations and recognizing the interconnectedness of humanity at a global level, as key prerequisites for the emergence of a stable embodied AGI.
- [130] arXiv:2402.06665 [ pdf , ps , html , other ]
-
Title: The Essential Role of Causality in Foundation World Models for Embodied AITarun Gupta , Wenbo Gong , Chao Ma , Nick Pawlowski , Agrin Hilmkil , Meyer Scetbon , Marc Rigter , Ade Famoti , Ashley Juan Llorens , Jianfeng Gao , Stefan Bauer , Danica Kragic , Bernhard Schölkopf , Cheng ZhangSubjects: Artificial Intelligence (cs.AI) ; Computation and Language (cs.CL); Machine Learning (cs.LG); Robotics (cs.RO)
Abstract: Recent advances in foundation models, especially in large multi-modal models and conversational agents, have ignited interest in the potential of generally capable embodied agents. Such agents will require the ability to perform new tasks in many different real-world environments. However, current foundation models fail to accurately model physical interactions and are therefore insufficient for Embodied AI. The study of causality lends itself to the construction of veridical world models, which are crucial for accurately predicting the outcomes of possible interactions. This paper focuses on the prospects of building foundation world models for the upcoming generation of embodied agents and presents a novel viewpoint on the significance of causality within these. We posit that integrating causal considerations is vital to facilitating meaningful physical interactions with the world. Finally, we demystify misconceptions about causality in this context and present our outlook for future research.
- [131] arXiv:2402.06673 [ pdf , ps , html , other ]
-
Title: Advancing Explainable AI Toward Human-Like Intelligence: Forging the Path to Artificial BrainSubjects: Artificial Intelligence (cs.AI)
Abstract: The intersection of Artificial Intelligence (AI) and neuroscience in Explainable AI (XAI) is pivotal for enhancing transparency and interpretability in complex decision-making processes. This paper explores the evolution of XAI methodologies, ranging from feature-based to human-centric approaches, and delves into their applications in diverse domains, including healthcare and finance. The challenges in achieving explainability in generative models, ensuring responsible AI practices, and addressing ethical implications are discussed. The paper further investigates the potential convergence of XAI with cognitive sciences, the development of emotionally intelligent AI, and the quest for Human-Like Intelligence (HLI) in AI systems. As AI progresses towards Artificial General Intelligence (AGI), considerations of consciousness, ethics, and societal impact become paramount. The ongoing pursuit of deciphering the mysteries of the brain with AI and the quest for HLI represent transformative endeavors, bridging technical advancements with multidisciplinary explorations of human cognition.
- [132] arXiv:2402.06695 [ pdf , ps , html , other ]
-
Title: Integrating LLMs for Explainable Fault Diagnosis in Complex SystemsComments: 4 pagesSubjects: Artificial Intelligence (cs.AI) ; Machine Learning (cs.LG); Systems and Control (eess.SY)
Abstract: This paper introduces an integrated system designed to enhance the explainability of fault diagnostics in complex systems, such as nuclear power plants, where operator understanding is critical for informed decision-making. By combining a physics-based diagnostic tool with a Large Language Model, we offer a novel solution that not only identifies faults but also provides clear, understandable explanations of their causes and implications. The system's efficacy is demonstrated through application to a molten salt facility, showcasing its ability to elucidate the connections between diagnosed faults and sensor data, answer operator queries, and evaluate historical sensor anomalies. Our approach underscores the importance of merging model-based diagnostics with advanced AI to improve the reliability and transparency of autonomous systems.
- [133] arXiv:2402.06764 [ pdf , ps , html , other ]
-
Title: GLaM: Fine-Tuning Large Language Models for Domain Knowledge Graph Alignment via Neighborhood Partitioning and Generative Subgraph EncodingComments: Published in AAAI Spring Symposium: AAAI-MAKE 2024Subjects: Artificial Intelligence (cs.AI)
Abstract: Integrating large language models (LLMs) with knowledge graphs derived from domain-specific data represents an important advancement towards more powerful and factual reasoning. As these models grow more capable, it is crucial to enable them to perform multi-step inferences over real-world knowledge graphs while minimizing hallucination. While large language models excel at conversation and text generation, their ability to reason over domain-specialized graphs of interconnected entities remains limited. For example, can we query a LLM to identify the optimal contact in a professional network for a specific goal, based on relationships and attributes in a private database? The answer is no--such capabilities lie beyond current methods. However, this question underscores a critical technical gap that must be addressed. Many high-value applications in areas such as science, security, and e-commerce rely on proprietary knowledge graphs encoding unique structures, relationships, and logical constraints. We introduce a fine-tuning framework for developing Graph-aligned LAnguage Models (GLaM) that transforms a knowledge graph into an alternate text representation with labeled question-answer pairs. We demonstrate that grounding the models in specific graph-based knowledge expands the models' capacity for structure-based reasoning. Our methodology leverages the large-language model's generative capabilities to create the dataset and proposes an efficient alternate to retrieval-augmented generation styled methods.
- [134] arXiv:2402.06782 [ pdf , ps , html , other ]
-
Title: Debating with More Persuasive LLMs Leads to More Truthful AnswersAkbir Khan , John Hughes , Dan Valentine , Laura Ruis , Kshitij Sachan , Ansh Radhakrishnan , Edward Grefenstette , Samuel R. Bowman , Tim Rocktäschel , Ethan PerezComments: For code please check: this https URLSubjects: Artificial Intelligence (cs.AI) ; Computation and Language (cs.CL)
Abstract: Common methods for aligning large language models (LLMs) with desired behaviour heavily rely on human-labelled data. However, as models grow increasingly sophisticated, they will surpass human expertise, and the role of human evaluation will evolve into non-experts overseeing experts. In anticipation of this, we ask: can weaker models assess the correctness of stronger models? We investigate this question in an analogous setting, where stronger models (experts) possess the necessary information to answer questions and weaker models (non-experts) lack this information. The method we evaluate is \textit{debate}, where two LLM experts each argue for a different answer, and a non-expert selects the answer. We find that debate consistently helps both non-expert models and humans answer questions, achieving 76\% and 88\% accuracy respectively (naive baselines obtain 48\% and 60\%). Furthermore, optimising expert debaters for persuasiveness in an unsupervised manner improves non-expert ability to identify the truth in debates. Our results provide encouraging empirical evidence for the viability of aligning models with debate in the absence of ground truth.
- [135] arXiv:2402.06811 [ pdf , ps , html , other ]
-
Title: Discipline and Label: A WEIRD Genealogy and Social Theory of Data AnnotationAndrew Smart , Ding Wang , Ellis Monk , Mark Díaz , Atoosa Kasirzadeh , Erin Van Liemt , Sonja Schmer-GalunderComments: 18 pagesSubjects: Artificial Intelligence (cs.AI)
Abstract: Data annotation remains the sine qua non of machine learning and AI. Recent empirical work on data annotation has begun to highlight the importance of rater diversity for fairness, model performance, and new lines of research have begun to examine the working conditions for data annotation workers, the impacts and role of annotator subjectivity on labels, and the potential psychological harms from aspects of annotation work. This paper outlines a critical genealogy of data annotation; starting with its psychological and perceptual aspects. We draw on similarities with critiques of the rise of computerized lab-based psychological experiments in the 1970's which question whether these experiments permit the generalization of results beyond the laboratory settings within which these results are typically obtained. Do data annotations permit the generalization of results beyond the settings, or locations, in which they were obtained? Psychology is overly reliant on participants from Western, Educated, Industrialized, Rich, and Democratic societies (WEIRD). Many of the people who work as data annotation platform workers, however, are not from WEIRD countries; most data annotation workers are based in Global South countries. Social categorizations and classifications from WEIRD countries are imposed on non-WEIRD annotators through instructions and tasks, and through them, on data, which is then used to train or evaluate AI models in WEIRD countries. We synthesize evidence from several recent lines of research and argue that data annotation is a form of automated social categorization that risks entrenching outdated and static social categories that are in reality dynamic and changing. We propose a framework for understanding the interplay of the global social conditions of data annotation with the subjective phenomenological experience of data annotation work.
- [136] arXiv:2402.06852 [ pdf , ps , other ]
-
Title: ChemLLM: A Chemical Large Language ModelDi Zhang , Wei Liu , Qian Tan , Jingdan Chen , Hang Yan , Yuliang Yan , Jiatong Li , Weiran Huang , Xiangyu Yue , Wanli Ouyang , Dongzhan Zhou , Shufei Zhang , Mao Su , Han-Sen Zhong , Yuqiang LiComments: 9 pages, 5 figuresSubjects: Artificial Intelligence (cs.AI) ; Computation and Language (cs.CL)
Abstract: Large language models (LLMs) have made impressive progress in chemistry applications. However, the community lacks an LLM specifically designed for chemistry. The main challenges are two-fold: firstly, most chemical data and scientific knowledge are stored in structured databases, which limits the model's ability to sustain coherent dialogue when used directly. Secondly, there is an absence of objective and fair benchmark that encompass most chemistry tasks. Here, we introduce ChemLLM, a comprehensive framework that features the first LLM dedicated to chemistry. It also includes ChemData, a dataset specifically designed for instruction tuning, and ChemBench, a robust benchmark covering nine essential chemistry tasks. ChemLLM is adept at performing various tasks across chemical disciplines with fluid dialogue interaction. Notably, ChemLLM achieves results comparable to GPT-4 on the core chemical tasks and demonstrates competitive performance with LLMs of similar size in general scenarios. ChemLLM paves a new path for exploration in chemical studies, and our method of incorporating structured chemical knowledge into dialogue systems sets a new standard for developing LLMs in various scientific fields. Codes, Datasets, and Model weights are publicly accessible at this https URL
- [137] arXiv:2402.06861 [ pdf , ps , html , other ]
-
Title: UrbanKGent: A Unified Large Language Model Agent Framework for Urban Knowledge Graph ConstructionComments: Under reviewSubjects: Artificial Intelligence (cs.AI)
Abstract: Urban knowledge graph has recently worked as an emerging building block to distill critical knowledge from multi-sourced urban data for diverse urban application scenarios. Despite its promising benefits, urban knowledge graph construction (UrbanKGC) still heavily relies on manual effort, hindering its potential advancement. This paper presents UrbanKGent, a unified large language model agent framework, for urban knowledge graph construction. Specifically, we first construct the knowledgeable instruction set for UrbanKGC tasks (such as relational triplet extraction and knowledge graph completion) via heterogeneity-aware and geospatial-infused instruction generation. Moreover, we propose a tool-augmented iterative trajectory refinement module to enhance and refine the trajectories distilled from GPT-4. Through hybrid instruction fine-tuning with augmented trajectories on Llama-2-13B, we obtain the UrbanKGC agent, UrbanKGent-13B. We perform a comprehensive evaluation on two real-world datasets using both human and GPT-4 self-evaluation. The experimental results demonstrate that UrbanKGent-13B not only can significantly outperform 21 baselines in UrbanKGC tasks, but also surpass the state-of-the-art LLM, GPT-4, by more than 10\% with approximately 20 times lower cost. We deploy UrbanKGent-13B to provide online services, which can construct an UrbanKG with thousands of times richer relationships using only one-fifth of the data compared with the existing benchmark. Our data, code, and opensource UrbanKGC agent are available at this https URL .
- [138] arXiv:2402.06929 [ pdf , ps , html , other ]
-
Title: Making a prototype of Seoul historical sites chatbot using LangchainComments: 4 pages, 4 figures, draftSubjects: Artificial Intelligence (cs.AI)
Abstract: In this paper, we are going to share a draft of the development of a conversational agent created to disseminate information about historical sites located in the Seoul. The primary objective of the agent is to increase awareness among visitors who are not familiar with Seoul, about the presence and precise locations of valuable cultural heritage sites. It aims to promote a basic understanding of Korea's rich and diverse cultural history. The agent is thoughtfully designed for accessibility in English and utilizes data generously provided by the Seoul Metropolitan Government. Despite the limited data volume, it consistently delivers reliable and accurate responses, seamlessly aligning with the available information. We have meticulously detailed the methodologies employed in creating this agent and provided a comprehensive overview of its underlying structure within the paper. Additionally, we delve into potential improvements to enhance this initial version of the system, with a primary emphasis on expanding the available data through our prompting. In conclusion, we provide an in-depth discussion of our expectations regarding the future impact of this agent in promoting and facilitating the sharing of historical sites.
- [139] arXiv:2402.07016 [ pdf , ps , other ]
-
Title: REALM: RAG-Driven Enhancement of Multimodal Electronic Health Records Analysis via Large Language ModelsYinghao Zhu , Changyu Ren , Shiyun Xie , Shukai Liu , Hangyuan Ji , Zixiang Wang , Tao Sun , Long He , Zhoujun Li , Xi Zhu , Chengwei PanSubjects: Artificial Intelligence (cs.AI)
Abstract: The integration of multimodal Electronic Health Records (EHR) data has significantly improved clinical predictive capabilities. Leveraging clinical notes and multivariate time-series EHR, existing models often lack the medical context relevent to clinical tasks, prompting the incorporation of external knowledge, particularly from the knowledge graph (KG). Previous approaches with KG knowledge have primarily focused on structured knowledge extraction, neglecting unstructured data modalities and semantic high dimensional medical knowledge. In response, we propose REALM, a Retrieval-Augmented Generation (RAG) driven framework to enhance multimodal EHR representations that address these limitations. Firstly, we apply Large Language Model (LLM) to encode long context clinical notes and GRU model to encode time-series EHR data. Secondly, we prompt LLM to extract task-relevant medical entities and match entities in professionally labeled external knowledge graph (PrimeKG) with corresponding medical knowledge. By matching and aligning with clinical standards, our framework eliminates hallucinations and ensures consistency. Lastly, we propose an adaptive multimodal fusion network to integrate extracted knowledge with multimodal EHR data. Our extensive experiments on MIMIC-III mortality and readmission tasks showcase the superior performance of our REALM framework over baselines, emphasizing the effectiveness of each module. REALM framework contributes to refining the use of multimodal EHR data in healthcare and bridging the gap with nuanced medical context essential for informed clinical predictions.
- [140] arXiv:2402.07039 [ pdf , ps , other ]
-
Title: Coordinated Disclosure for AI: Beyond Security VulnerabilitiesSubjects: Artificial Intelligence (cs.AI) ; Cryptography and Security (cs.CR); Computers and Society (cs.CY)
Abstract: Harm reporting in the field of Artificial Intelligence (AI) currently operates on an ad hoc basis, lacking a structured process for disclosing or addressing algorithmic flaws. In contrast, the Coordinated Vulnerability Disclosure (CVD) ethos and ecosystem play a pivotal role in software security and transparency. Within the U.S. context, there has been a protracted legal and policy struggle to establish a safe harbor from the Computer Fraud and Abuse Act, aiming to foster institutional support for security researchers acting in good faith. Notably, algorithmic flaws in Machine Learning (ML) models present distinct challenges compared to traditional software vulnerabilities, warranting a specialized approach. To address this gap, we propose the implementation of a dedicated Coordinated Flaw Disclosure (CFD) framework tailored to the intricacies of machine learning and artificial intelligence issues. This paper delves into the historical landscape of disclosures in ML, encompassing the ad hoc reporting of harms and the emergence of participatory auditing. By juxtaposing these practices with the well-established disclosure norms in cybersecurity, we argue that the broader adoption of CFD has the potential to enhance public trust through transparent processes that carefully balance the interests of both organizations and the community.
- [141] arXiv:2402.07049 [ pdf , ps , other ]
-
Title: A Factor Graph Model of Trust for a Collaborative Multi-Agent SystemSubjects: Artificial Intelligence (cs.AI)
Abstract: In the field of Multi-Agent Systems (MAS), known for their openness, dynamism, and cooperative nature, the ability to trust the resources and services of other agents is crucial. Trust, in this setting, is the reliance and confidence an agent has in the information, behaviors, intentions, truthfulness, and capabilities of others within the system. Our paper introduces a new graphical approach that utilizes factor graphs to represent the interdependent behaviors and trustworthiness among agents. This includes modeling the behavior of robots as a trajectory of actions using a Gaussian process factor graph, which accounts for smoothness, obstacle avoidance, and trust-related factors. Our method for evaluating trust is decentralized and considers key interdependent sub-factors such as proximity safety, consistency, and cooperation. The overall system comprises a network of factor graphs that interact through trust-related factors and employs a Bayesian inference method to dynamically assess trust-based decisions with informed consent. The effectiveness of this method is validated via simulations and empirical tests with autonomous robots navigating unsignalized intersections.
- [142] arXiv:2402.07140 [ pdf , ps , html , other ]
-
Title: Graph Descriptive Order Improves Reasoning with Large Language ModelSubjects: Artificial Intelligence (cs.AI)
Abstract: In recent years, large language models have achieved state-of-the-art performance across multiple domains. However, the progress in the field of graph reasoning with LLM remains limited. Our work delves into this gap by thoroughly investigating graph reasoning with LLMs. In this work, we reveal the impact of the order of graph description on LLMs' graph reasoning performance, which significantly affects LLMs' reasoning abilities. By altering this order, we enhance the performance of LLMs from 42.22\% to 70\%. Furthermore, we introduce the Scaled Graph Reasoning benchmark for assessing LLMs' performance across various graph sizes and evaluate the relationship between LLMs' graph reasoning abilities and graph size. We discover that the graph reasoning performance of LLMs does not monotonically decrease with the increase in graph size. The experiments span several mainstream models, including GPT-3.5, LLaMA-2-7B, and LLaMA-2-13B, to offer a comprehensive evaluation.
- [143] arXiv:2402.07166 [ pdf , ps , other ]
-
Title: Social Evolution of Published Text and The Emergence of Artificial Intelligence Through Large Language Models and The Problem of Toxicity and BiasSubjects: Artificial Intelligence (cs.AI)
Abstract: We provide a birds eye view of the rapid developments in AI and Deep Learning that has led to the path-breaking emergence of AI in Large Language Models. The aim of this study is to place all these developments in a pragmatic broader historical social perspective without any exaggerations while at the same time without any pessimism that created the AI winter in the 1970s to 1990s. We also at the same time point out toxicity, bias, memorization, sycophancy, logical inconsistencies, hallucinations that exist just as a warning to the overly optimistic. We note here that just as this emergence of AI seems to occur at a threshold point in the number of neural connections or weights, it has also been observed that human brain and especially the cortex region is nothing special or extraordinary but simply a case of scaled-up version of the primate brain and that even the human intelligence seems like an emergent phenomena of scale.
- [144] arXiv:2402.07167 [ pdf , ps , other ]
-
Title: Large-Language-Model Empowered Dose Volume Histogram Prediction for Intensity Modulated RadiotherapySubjects: Artificial Intelligence (cs.AI)
Abstract: Treatment planning is currently a patient specific, time-consuming, and resource demanding task in radiotherapy. Dose-volume histogram (DVH) prediction plays a critical role in automating this process. The geometric relationship between DVHs in radiotherapy plans and organs-at-risk (OAR) and planning target volume (PTV) has been well established. This study explores the potential of deep learning models for predicting DVHs using images and subsequent human intervention facilitated by a large-language model (LLM) to enhance the planning quality. We propose a pipeline to convert unstructured images to a structured graph consisting of image-patch nodes and dose nodes. A novel Dose Graph Neural Network (DoseGNN) model is developed for predicting DVHs from the structured graph. The proposed DoseGNN is enhanced with the LLM to encode massive knowledge from prescriptions and interactive instructions from clinicians. In this study, we introduced an online human-AI collaboration (OHAC) system as a practical implementation of the concept proposed for the automation of intensity-modulated radiotherapy (IMRT) planning. In comparison to the widely-employed DL models used in radiotherapy, DoseGNN achieved mean square errors that were 80$\%$, 76$\%$ and 41.0$\%$ of those predicted by Swin U-Net Transformer, 3D U-Net CNN and vanilla MLP, respectively. Moreover, the LLM-empowered DoseGNN model facilitates seamless adjustment to treatment plans through interaction with clinicians using natural language.
- [145] arXiv:2402.07183 [ pdf , ps , other ]
-
Title: A Random Ensemble of Encrypted Vision Transformers for Adversarially Robust DefenseComments: 9 pagesSubjects: Artificial Intelligence (cs.AI)
Abstract: Deep neural networks (DNNs) are well known to be vulnerable to adversarial examples (AEs). In previous studies, the use of models encrypted with a secret key was demonstrated to be robust against white-box attacks, but not against black-box ones. In this paper, we propose a novel method using the vision transformer (ViT) that is a random ensemble of encrypted models for enhancing robustness against both white-box and black-box attacks. In addition, a benchmark attack method, called AutoAttack, is applied to models to test adversarial robustness objectively. In experiments, the method was demonstrated to be robust against not only white-box attacks but also black-box ones in an image classification task on the CIFAR-10 and ImageNet datasets. The method was also compared with the state-of-the-art in a standardized benchmark for adversarial robustness, RobustBench, and it was verified to outperform conventional defenses in terms of clean accuracy and robust accuracy.
- [146] arXiv:2402.07197 [ pdf , ps , html , other ]
-
Title: GraphTranslator: Aligning Graph Model to Large Language Model for Open-ended TasksMengmei Zhang , Mingwei Sun , Peng Wang , Shen Fan , Yanhu Mo , Xiaoxiao Xu , Hong Liu , Cheng Yang , Chuan ShiSubjects: Artificial Intelligence (cs.AI)
Abstract: Large language models (LLMs) like ChatGPT, exhibit powerful zero-shot and instruction-following capabilities, have catalyzed a revolutionary transformation across diverse fields, especially for open-ended tasks. While the idea is less explored in the graph domain, despite the availability of numerous powerful graph models (GMs), they are restricted to tasks in a pre-defined form. Although several methods applying LLMs to graphs have been proposed, they fail to simultaneously handle the pre-defined and open-ended tasks, with LLM as a node feature enhancer or as a standalone predictor. To break this dilemma, we propose to bridge the pretrained GM and LLM by a Translator, named GraphTranslator, aiming to leverage GM to handle the pre-defined tasks effectively and utilize the extended interface of LLMs to offer various open-ended tasks for GM. To train such Translator, we propose a Producer capable of constructing the graph-text alignment data along node information, neighbor information and model information. By translating node representation into tokens, GraphTranslator empowers an LLM to make predictions based on language instructions, providing a unified perspective for both pre-defined and open-ended tasks. Extensive results demonstrate the effectiveness of our proposed GraphTranslator on zero-shot node classification. The graph question answering experiments reveal our GraphTranslator potential across a broad spectrum of open-ended tasks through language instructions. Our code is available at: this https URL .
- [147] arXiv:2402.07199 [ pdf , ps , other ]
-
Title: Link-aware link prediction over temporal graph by pattern recognitionComments: 12 pages, one columnSubjects: Artificial Intelligence (cs.AI)
Abstract: A temporal graph can be considered as a stream of links, each of which represents an interaction between two nodes at a certain time. On temporal graphs, link prediction is a common task, which aims to answer whether the query link is true or not. To do this task, previous methods usually focus on the learning of representations of the two nodes in the query link. We point out that the learned representation by their models may encode too much information with side effects for link prediction because they have not utilized the information of the query link, i.e., they are link-unaware. Based on this observation, we propose a link-aware model: historical links and the query link are input together into the following model layers to distinguish whether this input implies a reasonable pattern that ends with the query link. During this process, we focus on the modeling of link evolution patterns rather than node representations. Experiments on six datasets show that our model achieves strong performances compared with state-of-the-art baselines, and the results of link prediction are interpretable. The code and datasets are available on the project website: this https URL .
- [148] arXiv:2402.07204 [ pdf , ps , other ]
-
Title: Synergizing Spatial Optimization with Large Language Models for Open-Domain Urban Itinerary PlanningYihong Tang , Zhaokai Wang , Ao Qu , Yihao Yan , Kebing Hou , Dingyi Zhuang , Xiaotong Guo , Jinhua Zhao , Zhan Zhao , Wei MaSubjects: Artificial Intelligence (cs.AI) ; Computation and Language (cs.CL); Machine Learning (cs.LG)
Abstract: In this paper, we for the first time propose the task of Open-domain Urban Itinerary Planning (OUIP) for citywalk, which directly generates itineraries based on users' requests described in natural language. OUIP is different from conventional itinerary planning, which limits users from expressing more detailed needs and hinders true personalization. Recently, large language models (LLMs) have shown potential in handling diverse tasks. However, due to non-real-time information, incomplete knowledge, and insufficient spatial awareness, they are unable to independently deliver a satisfactory user experience in OUIP. Given this, we present ItiNera, an OUIP system that synergizes spatial optimization with Large Language Models (LLMs) to provide services that customize urban itineraries based on users' needs. Specifically, we develop an LLM-based pipeline for extracting and updating POI features to create a user-owned personalized POI database. For each user request, we leverage LLM in cooperation with an embedding-based module for retrieving candidate POIs from the user's POI database. Then, a spatial optimization module is used to order these POIs, followed by LLM crafting a personalized, spatially coherent itinerary. To the best of our knowledge, this study marks the first integration of LLMs to innovate itinerary planning solutions. Extensive experiments on offline datasets and online subjective evaluation have demonstrated the capacities of our system to deliver more responsive and spatially coherent itineraries than current LLM-based solutions. Our system has been deployed in production at the TuTu online travel service and has attracted thousands of users for their urban travel planning.
- [149] arXiv:2402.07221 [ pdf , ps , other ]
-
Title: The Reasons that Agents Act: Intention and Instrumental GoalsComments: AAMAS24Subjects: Artificial Intelligence (cs.AI)
Abstract: Intention is an important and challenging concept in AI. It is important because it underlies many other concepts we care about, such as agency, manipulation, legal responsibility, and blame. However, ascribing intent to AI systems is contentious, and there is no universally accepted theory of intention applicable to AI agents. We operationalise the intention with which an agent acts, relating to the reasons it chooses its decision. We introduce a formal definition of intention in structural causal influence models, grounded in the philosophy literature on intent and applicable to real-world machine learning systems. Through a number of examples and results, we show that our definition captures the intuitive notion of intent and satisfies desiderata set-out by past work. In addition, we show how our definition relates to past concepts, including actual causality, and the notion of instrumental goals, which is a core idea in the literature on safe AI agents. Finally, we demonstrate how our definition can be used to infer the intentions of reinforcement learning agents and language models from their behaviour.
- [150] arXiv:2402.07226 [ pdf , ps , other ]
-
Title: Stitching Sub-Trajectories with Conditional Diffusion Model for Goal-Conditioned Offline RLSubjects: Artificial Intelligence (cs.AI)
Abstract: Offline Goal-Conditioned Reinforcement Learning (Offline GCRL) is an important problem in RL that focuses on acquiring diverse goal-oriented skills solely from pre-collected behavior datasets. In this setting, the reward feedback is typically absent except when the goal is achieved, which makes it difficult to learn policies especially from a finite dataset of suboptimal behaviors. In addition, realistic scenarios involve long-horizon planning, which necessitates the extraction of useful skills within sub-trajectories. Recently, the conditional diffusion model has been shown to be a promising approach to generate high-quality long-horizon plans for RL. However, their practicality for the goal-conditioned setting is still limited due to a number of technical assumptions made by the methods. In this paper, we propose SSD (Sub-trajectory Stitching with Diffusion), a model-based offline GCRL method that leverages the conditional diffusion model to address these limitations. In summary, we use the diffusion model that generates future plans conditioned on the target goal and value, with the target value estimated from the goal-relabeled offline dataset. We report state-of-the-art performance in the standard benchmark set of GCRL tasks, and demonstrate the capability to successfully stitch the segments of suboptimal trajectories in the offline data to generate high-quality plans.
- [151] arXiv:2402.07234 [ pdf , ps , html , other ]
-
Title: CPSDBench: A Large Language Model Evaluation Benchmark and Baseline for Chinese Public Security DomainSubjects: Artificial Intelligence (cs.AI)
Abstract: Large Language Models (LLMs) have demonstrated significant potential and effectiveness across multiple application domains. To assess the performance of mainstream LLMs in public security tasks, this study aims to construct a specialized evaluation benchmark tailored to the Chinese public security domain--CPSDbench. CPSDbench integrates datasets related to public security collected from real-world scenarios, supporting a comprehensive assessment of LLMs across four key dimensions: text classification, information extraction, question answering, and text generation. Furthermore, this study introduces a set of innovative evaluation metrics designed to more precisely quantify the efficacy of LLMs in executing tasks related to public security. Through the in-depth analysis and evaluation conducted in this research, we not only enhance our understanding of the performance strengths and limitations of existing models in addressing public security issues but also provide references for the future development of more accurate and customized LLM models targeted at applications in this field.
- [152] arXiv:2402.07326 [ pdf , ps , other ]
-
Title: Persian Speech Emotion Recognition by Fine-Tuning TransformersSubjects: Artificial Intelligence (cs.AI) ; Sound (cs.SD); Audio and Speech Processing (eess.AS)
Abstract: Given the significance of speech emotion recognition, numerous methods have been developed in recent years to create effective and efficient systems in this domain. One of these methods involves the use of pretrained transformers, fine-tuned to address this specific problem, resulting in high accuracy. Despite extensive discussions and global-scale efforts to enhance these systems, the application of this innovative and effective approach has received less attention in the context of Persian speech emotion recognition. In this article, we review the field of speech emotion recognition and its background, with an emphasis on the importance of employing transformers in this context. We present two models, one based on spectrograms and the other on the audio itself, fine-tuned using the shEMO dataset. These models significantly enhance the accuracy of previous systems, increasing it from approximately 65% to 80% on the mentioned dataset. Subsequently, to investigate the effect of multilinguality on the fine-tuning process, these same models are fine-tuned twice. First, they are fine-tuned using the English IEMOCAP dataset, and then they are fine-tuned with the Persian shEMO dataset. This results in an improved accuracy of 82% for the Persian emotion recognition system. Keywords: Persian Speech Emotion Recognition, shEMO, Self-Supervised Learning
- [153] arXiv:2402.07327 [ pdf , ps , other ]
-
Title: Multi-Modal Emotion Recognition by Text, Speech and Video Using Pretrained TransformersSubjects: Artificial Intelligence (cs.AI)
Abstract: Due to the complex nature of human emotions and the diversity of emotion representation methods in humans, emotion recognition is a challenging field. In this research, three input modalities, namely text, audio (speech), and video, are employed to generate multimodal feature vectors. For generating features for each of these modalities, pre-trained Transformer models with fine-tuning are utilized. In each modality, a Transformer model is used with transfer learning to extract feature and emotional structure. These features are then fused together, and emotion recognition is performed using a classifier. To select an appropriate fusion method and classifier, various feature-level and decision-level fusion techniques have been experimented with, and ultimately, the best model, which combines feature-level fusion by concatenating feature vectors and classification using a Support Vector Machine on the IEMOCAP multimodal dataset, achieves an accuracy of 75.42%. Keywords: Multimodal Emotion Recognition, IEMOCAP, Self-Supervised Learning, Transfer Learning, Transformer.
- [154] arXiv:2402.07350 [ pdf , ps , other ]
-
Title: Antagonistic AIComments: 17 pages, 1 figure, 5 tablesSubjects: Artificial Intelligence (cs.AI) ; Human-Computer Interaction (cs.HC)
Abstract: The vast majority of discourse around AI development assumes that subservient, "moral" models aligned with "human values" are universally beneficial -- in short, that good AI is sycophantic AI. We explore the shadow of the sycophantic paradigm, a design space we term antagonistic AI: AI systems that are disagreeable, rude, interrupting, confrontational, challenging, etc. -- embedding opposite behaviors or values. Far from being "bad" or "immoral," we consider whether antagonistic AI systems may sometimes have benefits to users, such as forcing users to confront their assumptions, build resilience, or develop healthier relational boundaries. Drawing from formative explorations and a speculative design workshop where participants designed fictional AI technologies that employ antagonism, we lay out a design space for antagonistic AI, articulating potential benefits, design techniques, and methods of embedding antagonistic elements into user experience. Finally, we discuss the many ethical challenges of this space and identify three dimensions for the responsible design of antagonistic AI -- consent, context, and framing.
- [155] arXiv:2402.07398 [ pdf , ps , html , other ]
-
Title: VisLingInstruct: Elevating Zero-Shot Learning in Multi-Modal Language Models with Autonomous Instruction OptimizationDongsheng Zhu , Xunzhu Tang , Weidong Han , Jinghui Lu , Yukun Zhao , Guoliang Xing , Junfeng Wang , Dawei YinComments: Accepted to NAACL2024 main conferenceSubjects: Artificial Intelligence (cs.AI)
Abstract: This paper presents VisLingInstruct, a novel approach to advancing Multi-Modal Language Models (MMLMs) in zero-shot learning. Current MMLMs show impressive zero-shot abilities in multi-modal tasks, but their performance depends heavily on the quality of instructions. VisLingInstruct tackles this by autonomously evaluating and optimizing instructional texts through In-Context Learning, improving the synergy between visual perception and linguistic expression in MMLMs. Alongside this instructional advancement, we have also optimized the visual feature extraction modules in MMLMs, further augmenting their responsiveness to textual cues. Our comprehensive experiments on MMLMs, based on FlanT5 and Vicuna, show that VisLingInstruct significantly improves zero-shot performance in visual multi-modal tasks. Notably, it achieves a 13.1% and 9% increase in accuracy over the prior state-of-the-art on the TextVQA and HatefulMemes datasets.
- [156] arXiv:2402.07404 [ pdf , ps , other ]
-
Title: Enhancing Multi-Criteria Decision Analysis with AI: Integrating Analytic Hierarchy Process and GPT-4 for Automated Decision SupportComments: 24 pages, 1 figureSubjects: Artificial Intelligence (cs.AI) ; Cryptography and Security (cs.CR); Multiagent Systems (cs.MA)
Abstract: Our study presents a new framework that incorporates the Analytic Hierarchy Process (AHP) and Generative Pre-trained Transformer 4 (GPT-4) large language model (LLM), bringing novel approaches to cybersecurity Multiple-criteria Decision Making (MCDA). By utilizing the capabilities of GPT-4 autonomous agents as virtual experts, we automate the decision-making process, enhancing both efficiency and reliability. This new approach focuses on leveraging LLMs for sophisticated decision analysis, highlighting the synergy between traditional decision-making models and cutting-edge AI technologies. Our innovative methodology demonstrates significant advancements in using AI-driven agents for complex decision-making scenarios, highlighting the importance of AI in strategic cybersecurity applications. The findings reveal the transformative potential of combining AHP and LLMs, establishing a new paradigm for intelligent decision support systems in cybersecurity and beyond.
- [157] arXiv:2402.07418 [ pdf , ps , other ]
-
Title: SemTra: A Semantic Skill Translator for Cross-Domain Zero-Shot Policy AdaptationComments: AAAI 2024 Camera-ready versionSubjects: Artificial Intelligence (cs.AI)
Abstract: This work explores the zero-shot adaptation capability of semantic skills, semantically interpretable experts' behavior patterns, in cross-domain settings, where a user input in interleaved multi-modal snippets can prompt a new long-horizon task for different domains. In these cross-domain settings, we present a semantic skill translator framework SemTra which utilizes a set of multi-modal models to extract skills from the snippets, and leverages the reasoning capabilities of a pretrained language model to adapt these extracted skills to the target domain. The framework employs a two-level hierarchy for adaptation: task adaptation and skill adaptation. During task adaptation, seq-to-seq translation by the language model transforms the extracted skills into a semantic skill sequence, which is tailored to fit the cross-domain contexts. Skill adaptation focuses on optimizing each semantic skill for the target domain context, through parametric instantiations that are facilitated by language prompting and contrastive learning-based context inferences. This hierarchical adaptation empowers the framework to not only infer a complex task specification in one-shot from the interleaved multi-modal snippets, but also adapt it to new domains with zero-shot learning abilities. We evaluate our framework with Meta-World, Franka Kitchen, RLBench, and CARLA environments. The results clarify the framework's superiority in performing long-horizon tasks and adapting to different domains, showing its broad applicability in practical use cases, such as cognitive robots interpreting abstract instructions and autonomous vehicles operating under varied configurations.
- [158] arXiv:2402.07420 [ pdf , ps , other ]
-
Title: On the Transit Obfuscation ProblemSubjects: Artificial Intelligence (cs.AI)
Abstract: Concealing an intermediate point on a route or visible from a route is an important goal in some transportation and surveillance scenarios. This paper studies the Transit Obfuscation Problem, the problem of traveling from some start location to an end location while "covering" a specific transit point that needs to be concealed from adversaries. We propose the notion of transit anonymity, a quantitative guarantee of the anonymity of a specific transit point, even with a powerful adversary with full knowledge of the path planning algorithm. We propose and evaluate planning/search algorithms that satisfy this anonymity criterion.
- [159] arXiv:2402.07422 [ pdf , ps , other ]
-
Title: News Recommendation with Attention MechanismComments: 7 pages, Journal of Industrial Engineering and Applied ScienceJournal-ref: Journal of Industrial Engineering and Applied Science 2024Subjects: Artificial Intelligence (cs.AI)
Abstract: This paper explores the area of news recommendation, a key component of online information sharing. Initially, we provide a clear introduction to news recommendation, defining the core problem and summarizing current methods and notable recent algorithms. We then present our work on implementing the NRAM (News Recommendation with Attention Mechanism), an attention-based approach for news recommendation, and assess its effectiveness. Our evaluation shows that NRAM has the potential to significantly improve how news content is personalized for users on digital news platforms.
- [160] arXiv:2402.07429 [ pdf , ps , other ]
-
Title: Particle Filter SLAM for Vehicle LocalizationComments: 6 pages, Journal of Industrial Engineering and Applied ScienceJournal-ref: Journal of Industrial Engineering and Applied Science 2024Subjects: Artificial Intelligence (cs.AI)
Abstract: Simultaneous Localization and Mapping (SLAM) presents a formidable challenge in robotics, involving the dynamic construction of a map while concurrently determining the precise location of the robotic agent within an unfamiliar environment. This intricate task is further compounded by the inherent "chicken-and-egg" dilemma, where accurate mapping relies on a dependable estimation of the robot's location, and vice versa. Moreover, the computational intensity of SLAM adds an additional layer of complexity, making it a crucial yet demanding topic in the field. In our research, we address the challenges of SLAM by adopting the Particle Filter SLAM method. Our approach leverages encoded data and fiber optic gyro (FOG) information to enable precise estimation of vehicle motion, while lidar technology contributes to environmental perception by providing detailed insights into surrounding obstacles. The integration of these data streams culminates in the establishment of a Particle Filter SLAM framework, representing a key endeavor in this paper to effectively navigate and overcome the complexities associated with simultaneous localization and mapping in robotic systems.
- [161] arXiv:2402.07442 [ pdf , ps , other ]
-
Title: Game Agent Driven by Free-Form Text Command: Using LLM-based Code Generation and Behavior BranchComments: This paper is posted at JSAI 2024 ConferenceSubjects: Artificial Intelligence (cs.AI)
Abstract: Several attempts have been made to implement text command control for game agents. However, current technologies are limited to processing predefined format commands. This paper proposes a pioneering text command control system for a game agent that can understand natural language commands expressed in free-form. The proposed system uses a large language model (LLM) for code generation to interpret and transform natural language commands into behavior branch, a proposed knowledge expression based on behavior trees, which facilitates execution by the game agent. This study conducted empirical validation within a game environment that simulates a Pokémon game and involved multiple participants. The results confirmed the system's ability to understand and carry out natural language commands, representing a noteworthy in the realm of real-time language interactive game agents.
Notice for the use of this material. The copyright of this material is retained by the Japanese Society for Artificial Intelligence (JSAI). This material is published here with the agreement of JSAI. Please be complied with Copyright Law of Japan if any users wish to reproduce, make derivative work, distribute or make available to the public any part or whole thereof. All Rights Reserved, Copyright (C) The Japanese Society for Artificial Intelligence. - [162] arXiv:2402.07456 [ pdf , ps , other ]
-
Title: OS-Copilot: Towards Generalist Computer Agents with Self-ImprovementZhiyong Wu , Chengcheng Han , Zichen Ding , Zhenmin Weng , Zhoumianze Liu , Shunyu Yao , Tao Yu , Lingpeng KongComments: Project page: this https URLSubjects: Artificial Intelligence (cs.AI)
Abstract: Autonomous interaction with the computer has been a longstanding challenge with great potential, and the recent proliferation of large language models (LLMs) has markedly accelerated progress in building digital agents. However, most of these agents are designed to interact with a narrow domain, such as a specific software or website. This narrow focus constrains their applicability for general computer tasks. To this end, we introduce OS-Copilot, a framework to build generalist agents capable of interfacing with comprehensive elements in an operating system (OS), including the web, code terminals, files, multimedia, and various third-party applications. We use OS-Copilot to create FRIDAY, a self-improving embodied agent for automating general computer tasks. On GAIA, a general AI assistants benchmark, FRIDAY outperforms previous methods by 35%, showcasing strong generalization to unseen applications via accumulated skills from previous tasks. We also present numerical and quantitative evidence that FRIDAY learns to control and self-improve on Excel and Powerpoint with minimal supervision. Our OS-Copilot framework and empirical findings provide infrastructure and insights for future research toward more capable and general-purpose computer agents.
- [163] arXiv:2402.07462 [ pdf , ps , other ]
-
Title: A Hormetic Approach to the Value-Loading Problem: Preventing the Paperclip Apocalypse?Comments: 24 pages, 7 figuresSubjects: Artificial Intelligence (cs.AI) ; Computers and Society (cs.CY); Machine Learning (cs.LG); Multiagent Systems (cs.MA); Theoretical Economics (econ.TH)
Abstract: The value-loading problem is a significant challenge for researchers aiming to create artificial intelligence (AI) systems that align with human values and preferences. This problem requires a method to define and regulate safe and optimal limits of AI behaviors. In this work, we propose HALO (Hormetic ALignment via Opponent processes), a regulatory paradigm that uses hormetic analysis to regulate the behavioral patterns of AI. Behavioral hormesis is a phenomenon where low frequencies of a behavior have beneficial effects, while high frequencies are harmful. By modeling behaviors as allostatic opponent processes, we can use either Behavioral Frequency Response Analysis (BFRA) or Behavioral Count Response Analysis (BCRA) to quantify the hormetic limits of repeatable behaviors. We demonstrate how HALO can solve the 'paperclip maximizer' scenario, a thought experiment where an unregulated AI tasked with making paperclips could end up converting all matter in the universe into paperclips. Our approach may be used to help create an evolving database of 'values' based on the hedonic calculus of repeatable behaviors with decreasing marginal utility. This positions HALO as a promising solution for the value-loading problem, which involves embedding human-aligned values into an AI system, and the weak-to-strong generalization problem, which explores whether weak models can supervise stronger models as they become more intelligent. Hence, HALO opens several research avenues that may lead to the development of a computational value system that allows an AI algorithm to learn whether the decisions it makes are right or wrong.
- [164] arXiv:2402.07477 [ pdf , ps , other ]
-
Title: Food Recommendation as Language Processing (F-RLP): A Personalized and Contextual ParadigmSubjects: Artificial Intelligence (cs.AI)
Abstract: State-of-the-art rule-based and classification-based food recommendation systems face significant challenges in becoming practical and useful. This difficulty arises primarily because most machine learning models struggle with problems characterized by an almost infinite number of classes and a limited number of samples within an unbalanced dataset. Conversely, the emergence of Large Language Models (LLMs) as recommendation engines offers a promising avenue. However, a general-purpose Recommendation as Language Processing (RLP) approach lacks the critical components necessary for effective food recommendations. To address this gap, we introduce Food Recommendation as Language Processing (F-RLP), a novel framework that offers a food-specific, tailored infrastructure. F-RLP leverages the capabilities of LLMs to maximize their potential, thereby paving the way for more accurate, personalized food recommendations.
- [165] arXiv:2402.07483 [ pdf , ps , other ]
-
Title: T-RAG: Lessons from the LLM TrenchesSubjects: Artificial Intelligence (cs.AI) ; Computation and Language (cs.CL)
Abstract: Large Language Models (LLM) have shown remarkable language capabilities fueling attempts to integrate them into applications across a wide range of domains. An important application area is question answering over private enterprise documents where the main considerations are data security, which necessitates applications that can be deployed on-prem, limited computational resources and the need for a robust application that correctly responds to queries. Retrieval-Augmented Generation (RAG) has emerged as the most prominent framework for building LLM-based applications. While building a RAG is relatively straightforward, making it robust and a reliable application requires extensive customization and relatively deep knowledge of the application domain. We share our experiences building and deploying an LLM application for question answering over private organizational documents. Our application combines the use of RAG with a finetuned open-source LLM. Additionally, our system, which we call Tree-RAG (T-RAG), uses a tree structure to represent entity hierarchies within the organization. This is used to generate a textual description to augment the context when responding to user queries pertaining to entities within the organization's hierarchy. Our evaluations show that this combination performs better than a simple RAG or finetuning implementation. Finally, we share some lessons learned based on our experiences building an LLM application for real-world use.
- [166] arXiv:2402.07507 [ pdf , ps , other ]
-
Title: Clustering Dynamics for Improved Speed Prediction Deriving from Topographical GPS RegistrationsSarah Almeida Carneiro (LIGM), Giovanni Chierchia (LIGM), Aurelie Pirayre (IFPEN), Laurent Najman (LIGM)Subjects: Artificial Intelligence (cs.AI)
Abstract: A persistent challenge in the field of Intelligent Transportation Systems is to extract accurate traffic insights from geographic regions with scarce or no data coverage. To this end, we propose solutions for speed prediction using sparse GPS data points and their associated topographical and road design features. Our goal is to investigate whether we can use similarities in the terrain and infrastructure to train a machine learning model that can predict speed in regions where we lack transportation data. For this we create a Temporally Orientated Speed Dictionary Centered on Topographically Clustered Roads, which helps us to provide speed correlations to selected feature configurations. Our results show qualitative and quantitative improvement over new and standard regression methods. The presented framework provides a fresh perspective on devising strategies for missing data traffic analysis.
- [167] arXiv:2402.07510 [ pdf , ps , other ]
-
Title: Secret Collusion Among Generative AI AgentsSumeet Ramesh Motwani , Mikhail Baranchuk , Martin Strohmeier , Vijay Bolina , Philip H.S. Torr , Lewis Hammond , Christian Schroeder de WittSubjects: Artificial Intelligence (cs.AI) ; Cryptography and Security (cs.CR)
Abstract: Recent capability increases in large language models (LLMs) open up applications in which teams of communicating generative AI agents solve joint tasks. This poses privacy and security challenges concerning the unauthorised sharing of information, or other unwanted forms of agent coordination. Modern steganographic techniques could render such dynamics hard to detect. In this paper, we comprehensively formalise the problem of secret collusion in systems of generative AI agents by drawing on relevant concepts from both the AI and security literature. We study incentives for the use of steganography, and propose a variety of mitigation measures. Our investigations result in a model evaluation framework that systematically tests capabilities required for various forms of secret collusion. We provide extensive empirical results across a range of contemporary LLMs. While the steganographic capabilities of current models remain limited, GPT-4 displays a capability jump suggesting the need for continuous monitoring of steganographic frontier model capabilities. We conclude by laying out a comprehensive research program to mitigate future risks of collusion between generative AI models.
- [168] arXiv:2402.07514 [ pdf , ps , html , other ]
-
Title: Physics-informed machine learning as a kernel methodNathan Doumèche (LPSM, EDF R&D OSIRIS), Francis Bach (DI-ENS), Claire Boyer (IUF, LPSM), Gérard Biau (LPSM)Subjects: Artificial Intelligence (cs.AI) ; Statistics Theory (math.ST)
Abstract: Physics-informed machine learning combines the expressiveness of data-based approaches with the interpretability of physical models. In this context, we consider a general regression problem where the empirical risk is regularized by a partial differential equation that quantifies the physical inconsistency. We prove that for linear differential priors, the problem can be formulated as a kernel regression task. Taking advantage of kernel theory, we derive convergence rates for the minimizer of the regularized risk and show that it converges at least at the Sobolev minimax rate. However, faster rates can be achieved, depending on the physical error. This principle is illustrated with a one-dimensional example, supporting the claim that regularizing the empirical risk with physical information can be beneficial to the statistical performance of estimators.
- [169] arXiv:2402.07536 [ pdf , ps , other ]
-
Title: BreakGPT: A Large Language Model with Multi-stage Structure for Financial Breakout DetectionSubjects: Artificial Intelligence (cs.AI) ; Computation and Language (cs.CL)
Abstract: Trading range breakout (TRB) is a key method in the technical analysis of financial trading, widely employed by traders in financial markets such as stocks, futures, and foreign exchange. However, distinguishing between true and false breakout and providing the correct rationale cause significant challenges to investors. Recently, large language models have achieved success in various downstream applications, but their effectiveness in the domain of financial breakout detection has been subpar. The reason is that the unique data and specific knowledge are required in breakout detection. To address these issues, we introduce BreakGPT, the first large language model for financial breakout detection. Furthermore, we have developed a novel framework for large language models, namely multi-stage structure, effectively reducing mistakes in downstream applications. Experimental results indicate that compared to GPT-3.5, BreakGPT improves the accuracy of answers and rational by 44%, with the multi-stage structure contributing 17.6% to the improvement. Additionally, it outperforms ChatGPT-4 by 42.07%. Our Code is publicly available: this https URL
- [170] arXiv:2402.07632 [ pdf , ps , html , other ]
-
Title: Overconfident and Unconfident AI Hinder Human-AI CollaborationSubjects: Artificial Intelligence (cs.AI) ; Human-Computer Interaction (cs.HC)
Abstract: AI transparency is a central pillar of responsible AI deployment and effective human-AI collaboration. A critical approach is communicating uncertainty, such as displaying AI's confidence level, or its correctness likelihood (CL), to users. However, these confidence levels are often uncalibrated, either overestimating or underestimating actual CL, posing risks and harms to human-AI collaboration. This study examines the effects of uncalibrated AI confidence on users' trust in AI, AI advice adoption, and collaboration outcomes. We further examined the impact of increased transparency, achieved through trust calibration support, on these outcomes. Our results reveal that uncalibrated AI confidence leads to both the misuse of overconfident AI and disuse of unconfident AI, thereby hindering outcomes of human-AI collaboration. Deficiency of trust calibration support exacerbates this issue by making it harder to detect uncalibrated confidence, promoting misuse and disuse of AI. Conversely, trust calibration support aids in recognizing uncalibration and reducing misuse, but it also fosters distrust and causes disuse of AI. Our findings highlight the importance of AI confidence calibration for enhancing human-AI collaboration and suggest directions for AI design and regulation.
- [171] arXiv:2402.07688 [ pdf , ps , other ]
-
Title: CyberMetric: A Benchmark Dataset for Evaluating Large Language Models Knowledge in CybersecuritySubjects: Artificial Intelligence (cs.AI) ; Cryptography and Security (cs.CR)
Abstract: Large Language Models (LLMs) excel across various domains, from computer vision to medical diagnostics. However, understanding the diverse landscape of cybersecurity, encompassing cryptography, reverse engineering, and managerial facets like risk assessment, presents a challenge, even for human experts. In this paper, we introduce CyberMetric, a benchmark dataset comprising 10,000 questions sourced from standards, certifications, research papers, books, and other publications in the cybersecurity domain. The questions are created through a collaborative process, i.e., merging expert knowledge with LLMs, including GPT-3.5 and Falcon-180B. Human experts spent over 200 hours verifying their accuracy and relevance. Beyond assessing LLMs' knowledge, the dataset's main goal is to facilitate a fair comparison between humans and different LLMs in cybersecurity. To achieve this, we carefully selected 80 questions covering a wide range of topics within cybersecurity and involved 30 participants of diverse expertise levels, facilitating a comprehensive comparison between human and machine intelligence in this area. The findings revealed that LLMs outperformed humans in almost every aspect of cybersecurity.
- [172] arXiv:2402.07744 [ pdf , ps , other ]
-
Title: Towards Unified Alignment Between Agents, Humans, and EnvironmentZonghan Yang , An Liu , Zijun Liu , Kaiming Liu , Fangzhou Xiong , Yile Wang , Zeyuan Yang , Qingyuan Hu , Xinrui Chen , Zhenhe Zhang , Fuwen Luo , Zhicheng Guo , Peng Li , Yang LiuComments: Project webpage: this https URLSubjects: Artificial Intelligence (cs.AI) ; Computation and Language (cs.CL); Machine Learning (cs.LG)
Abstract: The rapid progress of foundation models has led to the prosperity of autonomous agents, which leverage the universal capabilities of foundation models to conduct reasoning, decision-making, and environmental interaction. However, the efficacy of agents remains limited when operating in intricate, realistic environments. In this work, we introduce the principles of $\mathbf{U}$nified $\mathbf{A}$lignment for $\mathbf{A}$gents ($\mathbf{UA}^2$), which advocate for the simultaneous alignment of agents with human intentions, environmental dynamics, and self-constraints such as the limitation of monetary budgets. From the perspective of $\mathbf{UA}^2$, we review the current agent research and highlight the neglected factors in existing agent benchmarks and method candidates. We also conduct proof-of-concept studies by introducing realistic features to WebShop, including user profiles to demonstrate intentions, personalized reranking for complex environmental dynamics, and runtime cost statistics to reflect self-constraints. We then follow the principles of $\mathbf{UA}^2$ to propose an initial design of our agent, and benchmark its performance with several candidate baselines in the retrofitted WebShop. The extensive experimental results further prove the importance of the principles of $\mathbf{UA}^2$. Our research sheds light on the next steps of autonomous agent research with improved general problem-solving abilities.
- [173] arXiv:2402.07772 [ pdf , ps , html , other ]
-
Title: End-to-End Learning for Fair Multiobjective Optimization Under UncertaintySubjects: Artificial Intelligence (cs.AI)
Abstract: Many decision processes in artificial intelligence and operations research are modeled by parametric optimization problems whose defining parameters are unknown and must be inferred from observable data. The Predict-Then-Optimize (PtO) paradigm in machine learning aims to maximize downstream decision quality by training the parametric inference model end-to-end with the subsequent constrained optimization. This requires backpropagation through the optimization problem using approximation techniques specific to the problem's form, especially for nondifferentiable linear and mixed-integer programs. This paper extends the PtO methodology to optimization problems with nondifferentiable Ordered Weighted Averaging (OWA) objectives, known for their ability to ensure properties of fairness and robustness in decision models. Through a collection of training techniques and proposed application settings, it shows how optimization of OWA functions can be effectively integrated with parametric prediction for fair and robust optimization under uncertainty.
- [174] arXiv:2402.07787 [ pdf , ps , html , other ]
-
Title: Extensible Multi-Granularity Fusion Network for Aspect-based Sentiment AnalysisComments: 8 pages, 4 figuresSubjects: Artificial Intelligence (cs.AI) ; Computation and Language (cs.CL)
Abstract: Aspect-based Sentiment Analysis (ABSA) evaluates sentiment expressions within a text to comprehend sentiment information. Previous studies integrated external knowledge, such as knowledge graphs, to enhance the semantic features in ABSA models. Recent research has examined the use of Graph Neural Networks (GNNs) on dependency and constituent trees for syntactic analysis. With the ongoing development of ABSA, more innovative linguistic and structural features are being incorporated (e.g. latent graph), but this also introduces complexity and confusion. As of now, a scalable framework for integrating diverse linguistic and structural features into ABSA does not exist. This paper presents the Extensible Multi-Granularity Fusion (EMGF) network, which integrates information from dependency and constituent syntactic, attention semantic , and external knowledge graphs. EMGF, equipped with multi-anchor triplet learning and orthogonal projection, efficiently harnesses the combined potential of each granularity feature and their synergistic interactions, resulting in a cumulative effect without additional computational expenses. Experimental findings on SemEval 2014 and Twitter datasets confirm EMGF's superiority over existing ABSA methods.
- [175] arXiv:2402.07799 [ pdf , ps , other ]
-
Title: Generalising Planning Environment RedesignComments: Paper accepted at AAAI'24Subjects: Artificial Intelligence (cs.AI)
Abstract: In Environment Design, one interested party seeks to affect another agent's decisions by applying changes to the environment. Most research on planning environment (re)design assumes the interested party's objective is to facilitate the recognition of goals and plans, and search over the space of environment modifications to find the minimal set of changes that simplify those tasks and optimise a particular metric. This search space is usually intractable, so existing approaches devise metric-dependent pruning techniques for performing search more efficiently. This results in approaches that are not able to generalise across different objectives and/or metrics. In this paper, we argue that the interested party could have objectives and metrics that are not necessarily related to recognising agents' goals or plans. Thus, to generalise the task of Planning Environment Redesign, we develop a general environment redesign approach that is metric-agnostic and leverages recent research on top-quality planning to efficiently redesign planning environments according to any interested party's objective and metric. Experiments over a set of environment redesign benchmarks show that our general approach outperforms existing approaches when using well-known metrics, such as facilitating the recognition of goals, as well as its effectiveness when solving environment redesign tasks that optimise a novel set of different metrics.
- [176] arXiv:2402.07822 [ pdf , ps , other ]
-
Title: Understanding fitness landscapes in morpho-evolution via local optima networksComments: Submitted to GECCO-2024Subjects: Artificial Intelligence (cs.AI)
Abstract: Morpho-evolution (ME) refers to the simultaneous optimisation of a robot's design and controller to maximise performance given a task and environment. Many genetic encodings have been proposed which are capable of representing design and control. Previous research has provided empirical comparisons between encodings in terms of their performance with respect to an objective function and the diversity of designs that are evaluated, however there has been no attempt to explain the observed findings. We address this by applying Local Optima Network (LON) analysis to investigate the structure of the fitness landscapes induced by three different encodings when evolving a robot for a locomotion task, shedding new light on the ease by which different fitness landscapes can be traversed by a search process. This is the first time LON analysis has been applied in the field of ME despite its popularity in combinatorial optimisation domains; the findings will facilitate design of new algorithms or operators that are customised to ME landscapes in the future.
- [177] arXiv:2402.07877 [ pdf , ps , html , other ]
-
Title: WildfireGPT: Tailored Large Language Model for Wildfire AnalysisYangxinyu Xie , Tanwi Mallick , Joshua David Bergerson , John K. Hutchison , Duane R. Verner , Jordan Branham , M. Ross Alexander , Robert B. Ross , Yan Feng , Leslie-Anne Levy , Weijie SuSubjects: Artificial Intelligence (cs.AI)
Abstract: The recent advancement of large language models (LLMs) represents a transformational capability at the frontier of artificial intelligence (AI) and machine learning (ML). However, LLMs are generalized models, trained on extensive text corpus, and often struggle to provide context-specific information, particularly in areas requiring specialized knowledge such as wildfire details within the broader context of climate change. For decision-makers and policymakers focused on wildfire resilience and adaptation, it is crucial to obtain responses that are not only precise but also domain-specific, rather than generic. To that end, we developed WildfireGPT, a prototype LLM agent designed to transform user queries into actionable insights on wildfire risks. We enrich WildfireGPT by providing additional context such as climate projections and scientific literature to ensure its information is current, relevant, and scientifically accurate. This enables WildfireGPT to be an effective tool for delivering detailed, user-specific insights on wildfire risks to support a diverse set of end users, including researchers, engineers, urban planners, emergency managers, and infrastructure operators.
- [178] arXiv:2402.07890 [ pdf , ps , other ]
-
Title: MAIDCRL: Semi-centralized Multi-Agent Influence Dense-CNN Reinforcement LearningComments: 2022 IEEE Conference on Games (CoG)Subjects: Artificial Intelligence (cs.AI) ; Machine Learning (cs.LG)
Abstract: Distributed decision-making in multi-agent systems presents difficult challenges for interactive behavior learning in both cooperative and competitive systems. To mitigate this complexity, MAIDRL presents a semi-centralized Dense Reinforcement Learning algorithm enhanced by agent influence maps (AIMs), for learning effective multi-agent control on StarCraft Multi-Agent Challenge (SMAC) scenarios. In this paper, we extend the DenseNet in MAIDRL and introduce semi-centralized Multi-Agent Dense-CNN Reinforcement Learning, MAIDCRL, by incorporating convolutional layers into the deep model architecture, and evaluate the performance on both homogeneous and heterogeneous scenarios. The results show that the CNN-enabled MAIDCRL significantly improved the learning performance and achieved a faster learning rate compared to the existing MAIDRL, especially on more complicated heterogeneous SMAC scenarios. We further investigate the stability and robustness of our model. The statistics reflect that our model not only achieves higher winning rate in all the given scenarios but also boosts the agent's learning process in fine-grained decision-making.
- [179] arXiv:2402.07925 [ pdf , ps , html , other ]
-
Title: Point and Instruct: Enabling Precise Image Editing by Unifying Direct Manipulation and Text InstructionsSubjects: Artificial Intelligence (cs.AI) ; Human-Computer Interaction (cs.HC)
Abstract: Machine learning has enabled the development of powerful systems capable of editing images from natural language instructions. However, in many common scenarios it is difficult for users to specify precise image transformations with text alone. For example, in an image with several dogs, it is difficult to select a particular dog and move it to a precise location. Doing this with text alone would require a complex prompt that disambiguates the target dog and describes the destination. However, direct manipulation is well suited to visual tasks like selecting objects and specifying locations. We introduce Point and Instruct, a system for seamlessly combining familiar direct manipulation and textual instructions to enable precise image manipulation. With our system, a user can visually mark objects and locations, and reference them in textual instructions. This allows users to benefit from both the visual descriptiveness of natural language and the spatial precision of direct manipulation.
- [180] arXiv:2402.07927 [ pdf , ps , html , other ]
-
Title: A Systematic Survey of Prompt Engineering in Large Language Models: Techniques and ApplicationsComments: 9 pages, 2 figuresSubjects: Artificial Intelligence (cs.AI) ; Computation and Language (cs.CL); Human-Computer Interaction (cs.HC)
Abstract: Prompt engineering has emerged as an indispensable technique for extending the capabilities of large language models (LLMs) and vision-language models (VLMs). This approach leverages task-specific instructions, known as prompts, to enhance model efficacy without modifying the core model parameters. Rather than updating the model parameters, prompts allow seamless integration of pre-trained models into downstream tasks by eliciting desired model behaviors solely based on the given prompt. Prompts can be natural language instructions that provide context to guide the model or learned vector representations that activate relevant knowledge. This burgeoning field has enabled success across various applications, from question-answering to commonsense reasoning. However, there remains a lack of systematic organization and understanding of the diverse prompt engineering methods and techniques. This survey paper addresses the gap by providing a structured overview of recent advancements in prompt engineering, categorized by application area. For each prompting approach, we provide a summary detailing the prompting methodology, its applications, the models involved, and the datasets utilized. We also delve into the strengths and limitations of each approach and include a taxonomy diagram and table summarizing datasets, models, and critical points of each prompting technique. This systematic analysis enables a better understanding of this rapidly developing field and facilitates future research by illuminating open challenges and opportunities for prompt engineering.
- [181] arXiv:2402.07963 [ pdf , ps , html , other ]
-
Title: SMX: Sequential Monte Carlo Planning for Expert IterationMatthew V Macfarlane , Edan Toledo , Donal Byrne , Siddarth Singh , Paul Duckworth , Alexandre LaterreComments: 25 pages, 5 main figuresSubjects: Artificial Intelligence (cs.AI) ; Machine Learning (cs.LG)
Abstract: Developing agents that can leverage planning abilities during their decision and learning processes is critical to the advancement of Artificial Intelligence. Recent works have demonstrated the effectiveness of combining tree-based search methods and self-play learning mechanisms. Yet, these methods typically face scaling challenges due to the sequential nature of their search. While practical engineering solutions can partly overcome this, they still demand extensive computational resources, which hinders their applicability. In this paper, we introduce SMX, a model-based planning algorithm that utilises scalable Sequential Monte Carlo methods to create an effective self-learning mechanism. Grounded in the theoretical framework of control as inference, SMX benefits from robust theoretical underpinnings. Its sampling-based search approach makes it adaptable to environments with both discrete and continuous action spaces. Furthermore, SMX allows for high parallelisation and can run on hardware accelerators to optimise computing efficiency. SMX demonstrates a statistically significant improvement in performance compared to AlphaZero, as well as demonstrating its performance as an improvement operator for a model-free policy, matching or exceeding top model-free methods across both continuous and discrete environments.
- [182] arXiv:2402.08064 [ pdf , ps , other ]
-
Title: Beyond LLMs: Advancing the Landscape of Complex ReasoningJennifer Chu-Carroll , Andrew Beck , Greg Burnham , David OS Melville , David Nachman , A. Erdem Özcan , David FerrucciSubjects: Artificial Intelligence (cs.AI) ; Computation and Language (cs.CL)
Abstract: Since the advent of Large Language Models a few years ago, they have often been considered the de facto solution for many AI problems. However, in addition to the many deficiencies of LLMs that prevent them from broad industry adoption, such as reliability, cost, and speed, there is a whole class of common real world problems that Large Language Models perform poorly on, namely, constraint satisfaction and optimization problems. These problems are ubiquitous and current solutions are highly specialized and expensive to implement. At Elemental Cognition, we developed our EC AI platform which takes a neuro-symbolic approach to solving constraint satisfaction and optimization problems. The platform employs, at its core, a precise and high performance logical reasoning engine, and leverages LLMs for knowledge acquisition and user interaction. This platform supports developers in specifying application logic in natural and concise language while generating application user interfaces to interact with users effectively. We evaluated LLMs against systems built on the EC AI platform in three domains and found the EC AI systems to significantly outperform LLMs on constructing valid and optimal solutions, on validating proposed solutions, and on repairing invalid solutions.
- [183] arXiv:2402.08088 [ pdf , ps , html , other ]
-
Title: Out-of-Distribution Detection and Data Drift Monitoring using Statistical Process ControlGhada Zamzmi , Kesavan Venkatesh , Brandon Nelson , Smriti Prathapan , Paul H. Yi , Berkman Sahiner , Jana G. DelfinoSubjects: Artificial Intelligence (cs.AI) ; Machine Learning (cs.LG); Image and Video Processing (eess.IV)
Abstract: Background: Machine learning (ML) methods often fail with data that deviates from their training distribution. This is a significant concern for ML-enabled devices in clinical settings, where data drift may cause unexpected performance that jeopardizes patient safety.
Method: We propose a ML-enabled Statistical Process Control (SPC) framework for out-of-distribution (OOD) detection and drift monitoring. SPC is advantageous as it visually and statistically highlights deviations from the expected distribution. To demonstrate the utility of the proposed framework for monitoring data drift in radiological images, we investigated different design choices, including methods for extracting feature representations, drift quantification, and SPC parameter selection.
Results: We demonstrate the effectiveness of our framework for two tasks: 1) differentiating axial vs. non-axial computed tomography (CT) images and 2) separating chest x-ray (CXR) from other modalities. For both tasks, we achieved high accuracy in detecting OOD inputs, with 0.913 in CT and 0.995 in CXR, and sensitivity of 0.980 in CT and 0.984 in CXR. Our framework was also adept at monitoring data streams and identifying the time a drift occurred. In a simulation with 100 daily CXR cases, we detected a drift in OOD input percentage from 0-1% to 3-5% within two days, maintaining a low false-positive rate. Through additional experimental results, we demonstrate the framework's data-agnostic nature and independence from the underlying model's structure.
Conclusion: We propose a framework for OOD detection and drift monitoring that is agnostic to data, modality, and model. The framework is customizable and can be adapted for specific applications. - [184] arXiv:2402.08115 [ pdf , ps , other ]
-
Title: On the Self-Verification Limitations of Large Language Models on Reasoning and Planning TasksComments: arXiv admin note: text overlap with arXiv:2310.12397Subjects: Artificial Intelligence (cs.AI)
Abstract: There has been considerable divergence of opinion on the reasoning abilities of Large Language Models (LLMs). While the initial optimism that reasoning might emerge automatically with scale has been tempered thanks to a slew of counterexamples--ranging from multiplication to simple planning--there persists a wide spread belief that LLMs can self-critique and improve their own solutions in an iterative fashion. This belief seemingly rests on the assumption that verification of correctness should be easier than generation--a rather classical argument from computational complexity--which should be irrelevant to LLMs to the extent that what they are doing is approximate retrieval. In this paper, we set out to systematically investigate the effectiveness of iterative prompting in the context of reasoning and planning. We present a principled empirical study of the performance of GPT-4 in three domains: Game of 24, Graph Coloring, and STRIPS planning. We experiment both with the model critiquing its own answers and with an external correct reasoner verifying proposed solutions. In each case, we analyze whether the content of criticisms actually affects bottom line performance, and whether we can ablate elements of the augmented system without losing performance. We observe significant performance collapse with self-critique, significant performance gains with sound external verification, but that the content of critique doesn't matter to the performance of the system. In fact, merely re-prompting with a sound verifier maintains most of the benefits of more involved setups.
- [185] arXiv:2402.08128 [ pdf , ps , html , other ]
-
Title: Recursive Joint Simulation in GamesSubjects: Artificial Intelligence (cs.AI) ; Computer Science and Game Theory (cs.GT)
Abstract: Game-theoretic dynamics between AI agents could differ from traditional human-human interactions in various ways. One such difference is that it may be possible to accurately simulate an AI agent, for example because its source code is known. Our aim is to explore ways of leveraging this possibility to achieve more cooperative outcomes in strategic settings. In this paper, we study an interaction between AI agents where the agents run a recursive joint simulation. That is, the agents first jointly observe a simulation of the situation they face. This simulation in turn recursively includes additional simulations (with a small chance of failure, to avoid infinite recursion), and the results of all these nested simulations are observed before an action is chosen. We show that the resulting interaction is strategically equivalent to an infinitely repeated version of the original game, allowing a direct transfer of existing results such as the various folk theorems.
- [186] arXiv:2402.08145 [ pdf , ps , html , other ]
-
Title: Epistemic Exploration for Generalizable Planning and Learning in Non-Stationary SettingsComments: To appear at ICAPS-24Subjects: Artificial Intelligence (cs.AI)
Abstract: This paper introduces a new approach for continual planning and model learning in non-stationary stochastic environments expressed using relational representations. Such capabilities are essential for the deployment of sequential decision-making systems in the uncertain, constantly evolving real world. Working in such practical settings with unknown (and non-stationary) transition systems and changing tasks, the proposed framework models gaps in the agent's current state of knowledge and uses them to conduct focused, investigative explorations. Data collected using these explorations is used for learning generalizable probabilistic models for solving the current task despite continual changes in the environment dynamics. Empirical evaluations on several benchmark domains show that this approach significantly outperforms planning and RL baselines in terms of sample complexity in non-stationary settings. Theoretical results show that the system reverts to exhibit desirable convergence properties when stationarity holds.
- [187] arXiv:2402.08174 [ pdf , ps , html , other ]
-
Title: Hierarchical Position Embedding of Graphs with Landmarks and Clustering for Link PredictionComments: The International World Wide Web Conference (WWW) 2024, Accepted paperSubjects: Artificial Intelligence (cs.AI) ; Machine Learning (cs.LG)
Abstract: Learning positional information of nodes in a graph is important for link prediction tasks. We propose a representation of positional information using representative nodes called landmarks. A small number of nodes with high degree centrality are selected as landmarks, which serve as reference points for the nodes' positions. We justify this selection strategy for well-known random graph models and derive closed-form bounds on the average path lengths involving landmarks. In a model for power-law graphs, we prove that landmarks provide asymptotically exact information on inter-node distances. We apply theoretical insights to practical networks and propose Hierarchical Position embedding with Landmarks and Clustering (HPLC). HPLC combines landmark selection and graph clustering, where the graph is partitioned into densely connected clusters in which nodes with the highest degree are selected as landmarks. HPLC leverages the positional information of nodes based on landmarks at various levels of hierarchy such as nodes' distances to landmarks, inter-landmark distances and hierarchical grouping of clusters. Experiments show that HPLC achieves state-of-the-art performances of link prediction on various datasets in terms of HIT@K, MRR, and AUC. The code is available at \url{ this https URL }.
- [188] arXiv:2402.08178 [ pdf , ps , html , other ]
-
Title: LoTa-Bench: Benchmarking Language-oriented Task Planners for Embodied AgentsComments: ICLR 2024. Code: this https URLSubjects: Artificial Intelligence (cs.AI)
Abstract: Large language models (LLMs) have recently received considerable attention as alternative solutions for task planning. However, comparing the performance of language-oriented task planners becomes difficult, and there exists a dearth of detailed exploration regarding the effects of various factors such as pre-trained model selection and prompt construction. To address this, we propose a benchmark system for automatically quantifying performance of task planning for home-service embodied agents. Task planners are tested on two pairs of datasets and simulators: 1) ALFRED and AI2-THOR, 2) an extension of Watch-And-Help and VirtualHome. Using the proposed benchmark system, we perform extensive experiments with LLMs and prompts, and explore several enhancements of the baseline planner. We expect that the proposed benchmark tool would accelerate the development of language-oriented task planners.
- [189] arXiv:2402.08184 [ pdf , ps , html , other ]
-
Title: Enabling Multi-Agent Transfer Reinforcement Learning via Scenario Independent RepresentationComments: 2023 IEEE Conference on Games (CoG)Subjects: Artificial Intelligence (cs.AI) ; Machine Learning (cs.LG)
Abstract: Multi-Agent Reinforcement Learning (MARL) algorithms are widely adopted in tackling complex tasks that require collaboration and competition among agents in dynamic Multi-Agent Systems (MAS). However, learning such tasks from scratch is arduous and may not always be feasible, particularly for MASs with a large number of interactive agents due to the extensive sample complexity. Therefore, reusing knowledge gained from past experiences or other agents could efficiently accelerate the learning process and upscale MARL algorithms. In this study, we introduce a novel framework that enables transfer learning for MARL through unifying various state spaces into fixed-size inputs that allow one unified deep-learning policy viable in different scenarios within a MAS. We evaluated our approach in a range of scenarios within the StarCraft Multi-Agent Challenge (SMAC) environment, and the findings show significant enhancements in multi-agent learning performance using maneuvering skills learned from other scenarios compared to agents learning from scratch. Furthermore, we adopted Curriculum Transfer Learning (CTL), enabling our deep learning policy to progressively acquire knowledge and skills across pre-designed homogeneous learning scenarios organized by difficulty levels. This process promotes inter- and intra-agent knowledge transfer, leading to high multi-agent learning performance in more complicated heterogeneous scenarios.
- [190] arXiv:2402.08185 [ pdf , ps , html , other ]
-
Title: Advancing Data-driven Weather Forecasting: Time-Sliding Data Augmentation of ERA5Subjects: Artificial Intelligence (cs.AI) ; Computer Vision and Pattern Recognition (cs.CV); Atmospheric and Oceanic Physics (physics.ao-ph)
Abstract: Modern deep learning techniques, which mimic traditional numerical weather prediction (NWP) models and are derived from global atmospheric reanalysis data, have caused a significant revolution within a few years. In this new paradigm, our research introduces a novel strategy that deviates from the common dependence on high-resolution data, which is often constrained by computational resources, and instead utilizes low-resolution data (2.5 degrees) for global weather prediction and climate data analysis. Our main focus is evaluating data-driven weather prediction (DDWP) frameworks, specifically addressing sample size adequacy, structural improvements to the model, and the ability of climate data to represent current climatic trends. By using the Adaptive Fourier Neural Operator (AFNO) model via FourCastNet and a proposed time-sliding method to inflate the dataset of the ECMWF Reanalysis v5 (ERA5), this paper improves on conventional approaches by adding more variables and a novel approach to data augmentation and processing. Our findings reveal that despite the lower resolution, the proposed approach demonstrates considerable accuracy in predicting atmospheric conditions, effectively rivaling higher-resolution models. Furthermore, the study confirms the model's proficiency in reflecting current climate trends and its potential in predicting future climatic events, underscoring its utility in climate change strategies. This research marks a pivotal step in the realm of meteorological forecasting, showcasing the feasibility of lower-resolution data in producing reliable predictions and opening avenues for more accessible and inclusive climate modeling. The insights gleaned from this study not only contribute to the advancement of climate science but also lay the groundwork for future innovations in the field.
- [191] arXiv:2402.08208 [ pdf , ps , html , other ]
-
Title: Inherent Diverse Redundant Safety Mechanisms for AI-based Software Elements in Automotive ApplicationsComments: This article is accepted for the SAE WCX 2024 conference proceedingsSubjects: Artificial Intelligence (cs.AI)
Abstract: This paper explores the role and challenges of Artificial Intelligence (AI) algorithms, specifically AI-based software elements, in autonomous driving systems. These AI systems are fundamental in executing real-time critical functions in complex and high-dimensional environments. They handle vital tasks like multi-modal perception, cognition, and decision-making tasks such as motion planning, lane keeping, and emergency braking. A primary concern relates to the ability (and necessity) of AI models to generalize beyond their initial training data. This generalization issue becomes evident in real-time scenarios, where models frequently encounter inputs not represented in their training or validation data. In such cases, AI systems must still function effectively despite facing distributional or domain shifts. This paper investigates the risk associated with overconfident AI models in safety-critical applications like autonomous driving. To mitigate these risks, methods for training AI models that help maintain performance without overconfidence are proposed. This involves implementing certainty reporting architectures and ensuring diverse training data. While various distribution-based methods exist to provide safety mechanisms for AI models, there is a noted lack of systematic assessment of these methods, especially in the context of safety-critical automotive applications. Many methods in the literature do not adapt well to the quick response times required in safety-critical edge applications. This paper reviews these methods, discusses their suitability for safety-critical applications, and highlights their strengths and limitations. The paper also proposes potential improvements to enhance the safety and reliability of AI algorithms in autonomous vehicles in the context of rapid and accurate decision-making processes.
- [192] arXiv:2402.08211 [ pdf , ps , html , other ]
-
Title: Transformer Mechanisms Mimic Frontostriatal Gating Operations When Trained on Human Working Memory TasksComments: 8 pages, 4 figuresSubjects: Artificial Intelligence (cs.AI)
Abstract: Models based on the Transformer neural network architecture have seen success on a wide variety of tasks that appear to require complex "cognitive branching" -- or the ability to maintain pursuit of one goal while accomplishing others. In cognitive neuroscience, success on such tasks is thought to rely on sophisticated frontostriatal mechanisms for selective \textit{gating}, which enable role-addressable updating -- and later readout -- of information to and from distinct "addresses" of memory, in the form of clusters of neurons. However, Transformer models have no such mechanisms intentionally built-in. It is thus an open question how Transformers solve such tasks, and whether the mechanisms that emerge to help them to do so bear any resemblance to the gating mechanisms in the human brain. In this work, we analyze the mechanisms that emerge within a vanilla attention-only Transformer trained on a simple sequence modeling task inspired by a task explicitly designed to study working memory gating in computational cognitive neuroscience. We find that, as a result of training, the self-attention mechanism within the Transformer specializes in a way that mirrors the input and output gating mechanisms which were explicitly incorporated into earlier, more biologically-inspired architectures. These results suggest opportunities for future research on computational similarities between modern AI architectures and models of the human brain.
- [193] arXiv:2402.08236 [ pdf , ps , html , other ]
-
Title: BERT4FCA: A Method for Bipartite Link Prediction using Formal Concept Analysis and BERTComments: 23 pages, 5 figuresSubjects: Artificial Intelligence (cs.AI)
Abstract: We propose BERT4FCA, a novel method for link prediction in bipartite networks, using formal concept analysis (FCA) and BERT. Link prediction in bipartite networks is an important task that can solve various practical problems like friend recommendation in social networks and co-authorship prediction in author-paper networks. Recent research has found that in bipartite networks, maximal bi-cliques provide important information for link prediction, and they can be extracted by FCA. Some FCA-based bipartite link prediction methods have achieved good performance. However, we figured out that their performance could be further improved because these methods did not fully capture the rich information of the extracted maximal bi-cliques. To address this limitation, we propose an approach using BERT, which can learn more information from the maximal bi-cliques extracted by FCA and use them to make link prediction. We conduct experiments on three real-world bipartite networks and demonstrate that our method outperforms previous FCA-based methods, and some classic methods such as matrix-factorization and node2vec.
- [194] arXiv:2402.08242 [ pdf , ps , html , other ]
-
Title: Towards Equitable Agile Research and Development of AI and RoboticsComments: 15 pages (32 with refs + appendix), 2 figures, 1 table (7 with appendix), incorporates changes based on WeRobot 2023 Draft feedbackSubjects: Artificial Intelligence (cs.AI) ; Computers and Society (cs.CY); Machine Learning (cs.LG); Robotics (cs.RO); Software Engineering (cs.SE)
Abstract: Machine Learning (ML) and 'Artificial Intelligence' ('AI') methods tend to replicate and amplify existing biases and prejudices, as do Robots with AI. For example, robots with facial recognition have failed to identify Black Women as human, while others have categorized people, such as Black Men, as criminals based on appearance alone. A 'culture of modularity' means harms are perceived as 'out of scope', or someone else's responsibility, throughout employment positions in the 'AI supply chain'. Incidents are routine enough ( this http URL lists over 2000 examples) to indicate that few organizations are capable of completely respecting peoples' rights; meeting claimed equity, diversity, and inclusion (EDI or DEI) goals; or recognizing and then addressing such failures in their organizations and artifacts. We propose a framework for adapting widely practiced Research and Development (R&D) project management methodologies to build organizational equity capabilities and better integrate known evidence-based best practices. We describe how project teams can organize and operationalize the most promising practices, skill sets, organizational cultures, and methods to detect and address rights-based fairness, equity, accountability, and ethical problems as early as possible when they are often less harmful and easier to mitigate; then monitor for unforeseen incidents to adaptively and constructively address them. Our primary example adapts an Agile development process based on Scrum, one of the most widely adopted approaches to organizing R&D teams. We also discuss limitations of our proposed framework and future research directions.
- [195] arXiv:2402.08250 [ pdf , ps , other ]
-
Title: A survey of recent methods for addressing AI fairness and bias in biomedicineSubjects: Artificial Intelligence (cs.AI)
Abstract: Artificial intelligence (AI) systems have the potential to revolutionize clinical practices, including improving diagnostic accuracy and surgical decision-making, while also reducing costs and manpower. However, it is important to recognize that these systems may perpetuate social inequities or demonstrate biases, such as those based on race or gender. Such biases can occur before, during, or after the development of AI models, making it critical to understand and address potential biases to enable the accurate and reliable application of AI models in clinical settings. To mitigate bias concerns during model development, we surveyed recent publications on different debiasing methods in the fields of biomedical natural language processing (NLP) or computer vision (CV). Then we discussed the methods that have been applied in the biomedical domain to address bias. We performed our literature search on PubMed, ACM digital library, and IEEE Xplore of relevant articles published between January 2018 and December 2023 using multiple combinations of keywords. We then filtered the result of 10,041 articles automatically with loose constraints, and manually inspected the abstracts of the remaining 890 articles to identify the 55 articles included in this review. Additional articles in the references are also included in this review. We discuss each method and compare its strengths and weaknesses. Finally, we review other potential methods from the general domain that could be applied to biomedicine to address bias and improve fairness.The bias of AIs in biomedicine can originate from multiple sources. Existing debiasing methods that focus on algorithms can be categorized into distributional or algorithmic.
- [196] arXiv:2402.08269 [ pdf , ps , other ]
-
Title: Geometry-induced Implicit Regularization in Deep ReLU Neural NetworksSubjects: Artificial Intelligence (cs.AI) ; Machine Learning (cs.LG); Neural and Evolutionary Computing (cs.NE); Optimization and Control (math.OC); Statistics Theory (math.ST)
Abstract: It is well known that neural networks with many more parameters than training examples do not overfit. Implicit regularization phenomena, which are still not well understood, occur during optimization and 'good' networks are favored. Thus the number of parameters is not an adequate measure of complexity if we do not consider all possible networks but only the 'good' ones. To better understand which networks are favored during optimization, we study the geometry of the output set as parameters vary. When the inputs are fixed, we prove that the dimension of this set changes and that the local dimension, called batch functional dimension, is almost surely determined by the activation patterns in the hidden layers. We prove that the batch functional dimension is invariant to the symmetries of the network parameterization: neuron permutations and positive rescalings. Empirically, we establish that the batch functional dimension decreases during optimization. As a consequence, optimization leads to parameters with low batch functional dimensions. We call this phenomenon geometry-induced implicit regularization.The batch functional dimension depends on both the network parameters and inputs. To understand the impact of the inputs, we study, for fixed parameters, the largest attainable batch functional dimension when the inputs vary. We prove that this quantity, called computable full functional dimension, is also invariant to the symmetries of the network's parameterization, and is determined by the achievable activation patterns. We also provide a sampling theorem, showing a fast convergence of the estimation of the computable full functional dimension for a random input of increasing size. Empirically we find that the computable full functional dimension remains close to the number of parameters, which is related to the notion of local identifiability. This differs from the observed values for the batch functional dimension computed on training inputs and test inputs. The latter are influenced by geometry-induced implicit regularization.
- [197] arXiv:2402.08280 [ pdf , ps , other ]
-
Title: Pix2Code: Learning to Compose Neural Visual Concepts as ProgramsSubjects: Artificial Intelligence (cs.AI) ; Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Abstract: The challenge in learning abstract concepts from images in an unsupervised fashion lies in the required integration of visual perception and generalizable relational reasoning. Moreover, the unsupervised nature of this task makes it necessary for human users to be able to understand a model's learnt concepts and potentially revise false behaviours. To tackle both the generalizability and interpretability constraints of visual concept learning, we propose Pix2Code, a framework that extends program synthesis to visual relational reasoning by utilizing the abilities of both explicit, compositional symbolic and implicit neural representations. This is achieved by retrieving object representations from images and synthesizing relational concepts as lambda-calculus programs. We evaluate the diverse properties of Pix2Code on the challenging reasoning domains, Kandinsky Patterns and CURI, thereby testing its ability to identify compositional visual concepts that generalize to novel data and concept configurations. Particularly, in stark contrast to neural approaches, we show that Pix2Code's representations remain human interpretable and can be easily revised for improved performance.
- [198] arXiv:2402.08284 [ pdf , ps , other ]
-
Title: A Logical Approach to Criminal Case InvestigationComments: 11 pages, 11 figuresSubjects: Artificial Intelligence (cs.AI)
Abstract: XAI (eXplanable AI) techniques that have the property of explaining the reasons for their conclusions, i.e. explainability or interpretability, are attracting attention. XAI is expected to be used in the development of forensic science and the justice system. In today's forensic and criminal investigation environment, experts face many challenges due to large amounts of data, small pieces of evidence in a chaotic and complex environment, traditional laboratory structures and sometimes inadequate knowledge. All these can lead to failed investigations and miscarriages of justice. In this paper, we describe the application of one logical approach to crime scene investigation. The subject of the application is ``The Adventure of the Speckled Band'' from the Sherlock Holmes short stories. The applied data is the knowledge graph created for the Knowledge Graph Reasoning Challenge. We tried to find the murderer by inferring each person with the motive, opportunity, and method. We created an ontology of motives and methods of murder from dictionaries and dictionaries, added it to the knowledge graph of ``The Adventure of the Speckled Band'', and applied scripts to determine motives, opportunities, and methods.
- [199] arXiv:2402.08298 [ pdf , ps , html , other ]
-
Title: Time to Stop and Think: What kind of research do we want to do?Subjects: Artificial Intelligence (cs.AI)
Abstract: Experimentation is an intrinsic part of research in artificial intelligence since it allows for collecting quantitative observations, validating hypotheses, and providing evidence for their reformulation. For that reason, experimentation must be coherent with the purposes of the research, properly addressing the relevant questions in each case. Unfortunately, the literature is full of works whose experimentation is neither rigorous nor convincing, oftentimes designed to support prior beliefs rather than answering the relevant research questions.
In this paper, we focus on the field of metaheuristic optimization, since it is our main field of work, and it is where we have observed the misconduct that has motivated this letter. Even if we limit the focus of this manuscript to the experimental part of the research, our main goal is to sew the seed of sincere critical assessment of our work, sparking a reflection process both at the individual and the community level. Such a reflection process is too complex and extensive to be tackled as a whole. Therefore, to bring our feet to the ground, we will include in this document our reflections about the role of experimentation in our work, discussing topics such as the use of benchmark instances vs instance generators, or the statistical assessment of empirical results. That is, all the statements included in this document are personal views and opinions, which can be shared by others or not. Certainly, having different points of view is the basis to establish a good discussion process. - [200] arXiv:2402.08369 [ pdf , ps , other ]
-
Title: One-shot Imitation in a Non-Stationary Environment via Multi-Modal SkillComments: ICML-2023 Camera Ready VersionSubjects: Artificial Intelligence (cs.AI)
Abstract: One-shot imitation is to learn a new task from a single demonstration, yet it is a challenging problem to adopt it for complex tasks with the high domain diversity inherent in a non-stationary environment. To tackle the problem, we explore the compositionality of complex tasks, and present a novel skill-based imitation learning framework enabling one-shot imitation and zero-shot adaptation; from a single demonstration for a complex unseen task, a semantic skill sequence is inferred and then each skill in the sequence is converted into an action sequence optimized for environmental hidden dynamics that can vary over time. Specifically, we leverage a vision-language model to learn a semantic skill set from offline video datasets, where each skill is represented on the vision-language embedding space, and adapt meta-learning with dynamics inference to enable zero-shot skill adaptation. We evaluate our framework with various one-shot imitation scenarios for extended multi-stage Meta-world tasks, showing its superiority in learning complex tasks, generalizing to dynamics changes, and extending to different demonstration conditions and modalities, compared to other baselines.
- [201] arXiv:2402.08423 [ pdf , ps , other ]
-
Title: Vehicle Behavior Prediction by Episodic-Memory Implanted NDTComments: Accepted by ICRA2024Subjects: Artificial Intelligence (cs.AI)
Abstract: In autonomous driving, predicting the behavior (turning left, stopping, etc.) of target vehicles is crucial for the self-driving vehicle to make safe decisions and avoid accidents. Existing deep learning-based methods have shown excellent and accurate performance, but the black-box nature makes it untrustworthy to apply them in practical use. In this work, we explore the interpretability of behavior prediction of target vehicles by an Episodic Memory implanted Neural Decision Tree (abbrev. eMem-NDT). The structure of eMem-NDT is constructed by hierarchically clustering the text embedding of vehicle behavior descriptions. eMem-NDT is a neural-backed part of a pre-trained deep learning model by changing the soft-max layer of the deep model to eMem-NDT, for grouping and aligning the memory prototypes of the historical vehicle behavior features in training data on a neural decision tree. Each leaf node of eMem-NDT is modeled by a neural network for aligning the behavior memory prototypes. By eMem-NDT, we infer each instance in behavior prediction of vehicles by bottom-up Memory Prototype Matching (MPM) (searching the appropriate leaf node and the links to the root node) and top-down Leaf Link Aggregation (LLA) (obtaining the probability of future behaviors of vehicles for certain instances). We validate eMem-NDT on BLVD and LOKI datasets, and the results show that our model can obtain a superior performance to other methods with clear explainability. The code is available at this https URL .
- [202] arXiv:2402.08466 [ pdf , ps , other ]
-
Title: Taking Training Seriously: Human Guidance and Management-Based Regulation of Artificial IntelligenceComments: 12 pages, 1 figureSubjects: Artificial Intelligence (cs.AI) ; Computer Vision and Pattern Recognition (cs.CV)
Abstract: Fervent calls for more robust governance of the harms associated with artificial intelligence (AI) are leading to the adoption around the world of what regulatory scholars have called a management-based approach to regulation. Recent initiatives in the United States and Europe, as well as the adoption of major self-regulatory standards by the International Organization for Standardization, share in common a core management-based paradigm. These management-based initiatives seek to motivate an increase in human oversight of how AI tools are trained and developed. Refinements and systematization of human-guided training techniques will thus be needed to fit within this emerging era of management-based regulatory paradigm. If taken seriously, human-guided training can alleviate some of the technical and ethical pressures on AI, boosting AI performance with human intuition as well as better addressing the needs for fairness and effective explainability. In this paper, we discuss the connection between the emerging management-based regulatory frameworks governing AI and the need for human oversight during training. We broadly cover some of the technical components involved in human-guided training and then argue that the kinds of high-stakes use cases for AI that appear of most concern to regulators should lean more on human-guided training than on data-only training. We hope to foster a discussion between legal scholars and computer scientists involving how to govern a domain of technology that is vast, heterogenous, and dynamic in its applications and risks.
- [203] arXiv:2402.08472 [ pdf , ps , other ]
-
Title: Large Language Models for the Automated Analysis of Optimization AlgorithmsComments: Submitted to the GECCO 2024 conferenceSubjects: Artificial Intelligence (cs.AI)
Abstract: The ability of Large Language Models (LLMs) to generate high-quality text and code has fuelled their rise in popularity. In this paper, we aim to demonstrate the potential of LLMs within the realm of optimization algorithms by integrating them into STNWeb. This is a web-based tool for the generation of Search Trajectory Networks (STNs), which are visualizations of optimization algorithm behavior. Although visualizations produced by STNWeb can be very informative for algorithm designers, they often require a certain level of prior knowledge to be interpreted. In an attempt to bridge this knowledge gap, we have incorporated LLMs, specifically GPT-4, into STNWeb to produce extensive written reports, complemented by automatically generated plots, thereby enhancing the user experience and reducing the barriers to the adoption of this tool by the research community. Moreover, our approach can be expanded to other tools from the optimization community, showcasing the versatility and potential of LLMs in this field.
- [204] arXiv:2402.08492 [ pdf , ps , other ]
-
Title: The Application of ChatGPT in Responding to Questions Related to the Boston Bowel Preparation ScaleXiaoqiang Liu , Yubin Wang , Zicheng Huang , Boming Xu , Yilin Zeng , Xinqi Chen , Zilong Wang , Enning Yang , Xiaoxuan Lei , Yisen Huang , Xiaobo LiuSubjects: Artificial Intelligence (cs.AI)
Abstract: Background: Colonoscopy, a crucial diagnostic tool in gastroenterology, depends heavily on superior bowel preparation. ChatGPT, a large language model with emergent intelligence which also exhibits potential in medical applications. This study aims to assess the accuracy and consistency of ChatGPT in using the Boston Bowel Preparation Scale (BBPS) for colonoscopy assessment. Methods: We retrospectively collected 233 colonoscopy images from 2020 to 2023. These images were evaluated using the BBPS by 3 senior endoscopists and 3 novice endoscopists. Additionally, ChatGPT also assessed these images, having been divided into three groups and undergone specific Fine-tuning. Consistency was evaluated through two rounds of testing. Results: In the initial round, ChatGPT's accuracy varied between 48.93% and 62.66%, trailing the endoscopists' accuracy of 76.68% to 77.83%. Kappa values for ChatGPT was between 0.52 and 0.53, compared to 0.75 to 0.87 for the endoscopists. Conclusion: While ChatGPT shows promise in bowel preparation scoring, it currently does not match the accuracy and consistency of experienced endoscopists. Future research should focus on in-depth Fine-tuning.
- [205] arXiv:2402.08511 [ pdf , ps , other ]
-
Title: Amplifying Exploration in Monte-Carlo Tree Search by Focusing on the UnknownComments: 10 pages, 7 figuresSubjects: Artificial Intelligence (cs.AI)
Abstract: Monte-Carlo tree search (MCTS) is an effective anytime algorithm with a vast amount of applications. It strategically allocates computational resources to focus on promising segments of the search tree, making it a very attractive search algorithm in large search spaces. However, it often expends its limited resources on reevaluating previously explored regions when they remain the most promising path. Our proposed methodology, denoted as AmEx-MCTS, solves this problem by introducing a novel MCTS formulation. Central to AmEx-MCTS is the decoupling of value updates, visit count updates, and the selected path during the tree search, thereby enabling the exclusion of already explored subtrees or leaves. This segregation preserves the utility of visit counts for both exploration-exploitation balancing and quality metrics within MCTS. The resultant augmentation facilitates in a considerably broader search using identical computational resources, preserving the essential characteristics of MCTS. The expanded coverage not only yields more precise estimations but also proves instrumental in larger and more complex problems. Our empirical evaluation demonstrates the superior performance of AmEx-MCTS, surpassing classical MCTS and related approaches by a substantial margin.
- [206] arXiv:2402.08514 [ pdf , ps , other ]
-
Title: Counterfactual Influence in Markov Decision ProcessesComments: 12 pages, 6 figuresSubjects: Artificial Intelligence (cs.AI)
Abstract: Our work addresses a fundamental problem in the context of counterfactual inference for Markov Decision Processes (MDPs). Given an MDP path $\tau$, this kind of inference allows us to derive counterfactual paths $\tau'$ describing what-if versions of $\tau$ obtained under different action sequences than those observed in $\tau$. However, as the counterfactual states and actions deviate from the observed ones over time, the observation $\tau$ may no longer influence the counterfactual world, meaning that the analysis is no longer tailored to the individual observation, resulting in interventional outcomes rather than counterfactual ones. Even though this issue specifically affects the popular Gumbel-max structural causal model used for MDP counterfactuals, it has remained overlooked until now. In this work, we introduce a formal characterisation of influence based on comparing counterfactual and interventional distributions. We devise an algorithm to construct counterfactual models that automatically satisfy influence constraints. Leveraging such models, we derive counterfactual policies that are not just optimal for a given reward structure but also remain tailored to the observed path. Even though there is an unavoidable trade-off between policy optimality and strength of influence constraints, our experiments demonstrate that it is possible to derive (near-)optimal policies while remaining under the influence of the observation.
- [207] arXiv:2402.08565 [ pdf , ps , other ]
-
Title: Artificial Intelligence for Literature Reviews: Opportunities and ChallengesSubjects: Artificial Intelligence (cs.AI) ; Human-Computer Interaction (cs.HC); Information Retrieval (cs.IR)
Abstract: This manuscript presents a comprehensive review of the use of Artificial Intelligence (AI) in Systematic Literature Reviews (SLRs). A SLR is a rigorous and organised methodology that assesses and integrates previous research on a given topic. Numerous tools have been developed to assist and partially automate the SLR process. The increasing role of AI in this field shows great potential in providing more effective support for researchers, moving towards the semi-automatic creation of literature reviews. Our study focuses on how AI techniques are applied in the semi-automation of SLRs, specifically in the screening and extraction phases. We examine 21 leading SLR tools using a framework that combines 23 traditional features with 11 AI features. We also analyse 11 recent tools that leverage large language models for searching the literature and assisting academic writing. Finally, the paper discusses current trends in the field, outlines key research challenges, and suggests directions for future research.
- [208] arXiv:2402.08644 [ pdf , ps , html , other ]
-
Title: Tandem Transformers for Inference Efficient LLMsAishwarya P S , Pranav Ajit Nair , Yashas Samaga , Toby Boyd , Sanjiv Kumar , Prateek Jain , Praneeth NetrapalliSubjects: Artificial Intelligence (cs.AI) ; Computation and Language (cs.CL)
Abstract: The autoregressive nature of conventional large language models (LLMs) inherently limits inference speed, as tokens are generated sequentially. While speculative and parallel decoding techniques attempt to mitigate this, they face limitations: either relying on less accurate smaller models for generation or failing to fully leverage the base LLM's representations.
We introduce a novel architecture, Tandem transformers, to address these issues. This architecture uniquely combines (1) a small autoregressive model and (2) a large model operating in block mode (processing multiple tokens simultaneously). The small model's predictive accuracy is substantially enhanced by granting it attention to the large model's richer representations. On the PaLM2 pretraining dataset, a tandem of PaLM2-Bison and PaLM2-Gecko demonstrates a 3.3% improvement in next-token prediction accuracy over a standalone PaLM2-Gecko, offering a 1.16x speedup compared to a PaLM2-Otter model with comparable downstream performance. We further incorporate the tandem model within the speculative decoding (SPEED) framework where the large model validates tokens from the small model. This ensures that the Tandem of PaLM2-Bison and PaLM2-Gecko achieves substantial speedup (around 1.14x faster than using vanilla PaLM2-Gecko in SPEED) while maintaining identical downstream task accuracy. - [209] arXiv:2402.08646 [ pdf , ps , other ]
-
Title: Inference of Abstraction for a Unified Account of Symbolic Reasoning from DataSubjects: Artificial Intelligence (cs.AI)
Abstract: Inspired by empirical work in neuroscience for Bayesian approaches to brain function, we give a unified probabilistic account of various types of symbolic reasoning from data. We characterise them in terms of formal logic using the classical consequence relation, an empirical consequence relation, maximal consistent sets, maximal possible sets and maximum likelihood estimation. The theory gives new insights into reasoning towards human-like machine intelligence.
- [210] arXiv:2402.08670 [ pdf , ps , other ]
-
Title: Rec-GPT4V: Multimodal Recommendation with Large Vision-Language ModelsComments: under reviewSubjects: Artificial Intelligence (cs.AI)
Abstract: The development of large vision-language models (LVLMs) offers the potential to address challenges faced by traditional multimodal recommendations thanks to their proficient understanding of static images and textual dynamics. However, the application of LVLMs in this field is still limited due to the following complexities: First, LVLMs lack user preference knowledge as they are trained from vast general datasets. Second, LVLMs suffer setbacks in addressing multiple image dynamics in scenarios involving discrete, noisy, and redundant image sequences. To overcome these issues, we propose the novel reasoning scheme named Rec-GPT4V: Visual-Summary Thought (VST) of leveraging large vision-language models for multimodal recommendation. We utilize user history as in-context user preferences to address the first challenge. Next, we prompt LVLMs to generate item image summaries and utilize image comprehension in natural language space combined with item titles to query the user preferences over candidate items. We conduct comprehensive experiments across four datasets with three LVLMs: GPT4-V, LLaVa-7b, and LLaVa-13b. The numerical results indicate the efficacy of VST.
- [211] arXiv:2402.08755 [ pdf , ps , html , other ]
-
Title: LLM-driven Imitation of Subrational Behavior : Illusion or Reality?Subjects: Artificial Intelligence (cs.AI) ; General Economics (econ.GN)
Abstract: Modeling subrational agents, such as humans or economic households, is inherently challenging due to the difficulty in calibrating reinforcement learning models or collecting data that involves human subjects. Existing work highlights the ability of Large Language Models (LLMs) to address complex reasoning tasks and mimic human communication, while simulation using LLMs as agents shows emergent social behaviors, potentially improving our comprehension of human conduct. In this paper, we propose to investigate the use of LLMs to generate synthetic human demonstrations, which are then used to learn subrational agent policies though Imitation Learning. We make an assumption that LLMs can be used as implicit computational models of humans, and propose a framework to use synthetic demonstrations derived from LLMs to model subrational behaviors that are characteristic of humans (e.g., myopic behavior or preference for risk aversion). We experimentally evaluate the ability of our framework to model sub-rationality through four simple scenarios, including the well-researched ultimatum game and marshmallow experiment. To gain confidence in our framework, we are able to replicate well-established findings from prior human studies associated with the above scenarios. We conclude by discussing the potential benefits, challenges and limitations of our framework.
- [212] arXiv:2402.08772 [ pdf , ps , html , other ]
-
Title: Optimal Task Assignment and Path Planning using Conflict-Based Search with Precedence and Temporal ConstraintsSubjects: Artificial Intelligence (cs.AI) ; Multiagent Systems (cs.MA)
Abstract: The Multi-Agent Path Finding (MAPF) problem entails finding collision-free paths for a set of agents, guiding them from their start to goal locations. However, MAPF does not account for several practical task-related constraints. For example, agents may need to perform actions at goal locations with specific execution times, adhering to predetermined orders and timeframes. Moreover, goal assignments may not be predefined for agents, and the optimization objective may lack an explicit definition. To incorporate task assignment, path planning, and a user-defined objective into a coherent framework, this paper examines the Task Assignment and Path Finding with Precedence and Temporal Constraints (TAPF-PTC) problem. We augment Conflict-Based Search (CBS) to simultaneously generate task assignments and collision-free paths that adhere to precedence and temporal constraints, maximizing an objective quantified by the return from a user-defined reward function in reinforcement learning (RL). Experimentally, we demonstrate that our algorithm, CBS-TA-PTC, can solve highly challenging bomb-defusing tasks with precedence and temporal constraints efficiently relative to MARL and adapted Target Assignment and Path Finding (TAPF) methods.
- [213] arXiv:2402.08780 [ pdf , ps , html , other ]
-
Title: Enhanced Deep Q-Learning for 2D Self-Driving Cars: Implementation and Evaluation on a Custom Track EnvironmentComments: 8 pages, 8 figuresSubjects: Artificial Intelligence (cs.AI)
Abstract: This research project presents the implementation of a Deep Q-Learning Network (DQN) for a self-driving car on a 2-dimensional (2D) custom track, with the objective of enhancing the DQN network's performance. It encompasses the development of a custom driving environment using Pygame on a track surrounding the University of Memphis map, as well as the design and implementation of the DQN model. The algorithm utilizes data from 7 sensors installed in the car, which measure the distance between the car and the track. These sensors are positioned in front of the vehicle, spaced 20 degrees apart, enabling them to sense a wide area ahead. We successfully implemented the DQN and also a modified version of the DQN with a priority-based action selection mechanism, which we refer to as modified DQN. The model was trained over 1000 episodes, and the average reward received by the agent was found to be around 40, which is approximately 60% higher than the original DQN and around 50% higher than the vanilla neural network.
- [214] arXiv:2402.08806 [ pdf , ps , html , other ]
-
Title: Combining Insights From Multiple Large Language Models Improves Diagnostic AccuracyComments: 5 pages, 2 figures, 1 tableSubjects: Artificial Intelligence (cs.AI)
Abstract: Background: Large language models (LLMs) such as OpenAI's GPT-4 or Google's PaLM 2 are proposed as viable diagnostic support tools or even spoken of as replacements for "curbside consults". However, even LLMs specifically trained on medical topics may lack sufficient diagnostic accuracy for real-life applications.
Methods: Using collective intelligence methods and a dataset of 200 clinical vignettes of real-life cases, we assessed and compared the accuracy of differential diagnoses obtained by asking individual commercial LLMs (OpenAI GPT-4, Google PaLM 2, Cohere Command, Meta Llama 2) against the accuracy of differential diagnoses synthesized by aggregating responses from combinations of the same LLMs.
Results: We find that aggregating responses from multiple, various LLMs leads to more accurate differential diagnoses (average accuracy for 3 LLMs: $75.3\%\pm 1.6pp$) compared to the differential diagnoses produced by single LLMs (average accuracy for single LLMs: $59.0\%\pm 6.1pp$).
Discussion: The use of collective intelligence methods to synthesize differential diagnoses combining the responses of different LLMs achieves two of the necessary steps towards advancing acceptance of LLMs as a diagnostic support tool: (1) demonstrate high diagnostic accuracy and (2) eliminate dependence on a single commercial vendor. - [215] arXiv:2402.08859 [ pdf , ps , html , other ]
-
Title: Large Language Model with Graph Convolution for RecommendationYingpeng Du , Ziyan Wang , Zhu Sun , Haoyan Chua , Hongzhi Liu , Zhonghai Wu , Yining Ma , Jie Zhang , Youchen SunSubjects: Artificial Intelligence (cs.AI)
Abstract: In recent years, efforts have been made to use text information for better user profiling and item characterization in recommendations. However, text information can sometimes be of low quality, hindering its effectiveness for real-world applications. With knowledge and reasoning capabilities capsuled in Large Language Models (LLMs), utilizing LLMs emerges as a promising way for description improvement. However, existing ways of prompting LLMs with raw texts ignore structured knowledge of user-item interactions, which may lead to hallucination problems like inconsistent description generation. To this end, we propose a Graph-aware Convolutional LLM method to elicit LLMs to capture high-order relations in the user-item graph. To adapt text-based LLMs with structured graphs, We use the LLM as an aggregator in graph processing, allowing it to understand graph-based information step by step. Specifically, the LLM is required for description enhancement by exploring multi-hop neighbors layer by layer, thereby propagating information progressively in the graph. To enable LLMs to capture large-scale graph information, we break down the description task into smaller parts, which drastically reduces the context length of the token input with each step. Extensive experiments on three real-world datasets show that our method consistently outperforms state-of-the-art methods.
- [216] arXiv:2402.08869 [ pdf , ps , other ]
-
Title: ScamSpot: Fighting Financial Fraud in Instagram CommentsComments: EACL 2024 Demo Paper, 11 pagesSubjects: Artificial Intelligence (cs.AI)
Abstract: The long-standing problem of spam and fraudulent messages in the comment sections of Instagram pages in the financial sector claims new victims every day. Instagram's current spam filter proves inadequate, and existing research approaches are primarily confined to theoretical concepts. Practical implementations with evaluated results are missing. To solve this problem, we propose ScamSpot, a comprehensive system that includes a browser extension, a fine-tuned BERT model and a REST API. This approach ensures public accessibility of our results for Instagram users using the Chrome browser. Furthermore, we conduct a data annotation study, shedding light on the reasons and causes of the problem and evaluate the system through user feedback and comparison with existing models. ScamSpot is an open-source project and is publicly available at this https URL .
- [217] arXiv:2402.08939 [ pdf , ps , html , other ]
-
Title: Premise Order Matters in Reasoning with Large Language ModelsComments: Xinyun and Ryan contribute equallySubjects: Artificial Intelligence (cs.AI) ; Computation and Language (cs.CL)
Abstract: Large language models (LLMs) have accomplished remarkable reasoning performance in various domains. However, in the domain of reasoning tasks, we discover a frailty: LLMs are surprisingly brittle to the ordering of the premises, despite the fact that such ordering does not alter the underlying task. In particular, we observe that LLMs achieve the best performance when the premise order aligns with the context required in intermediate reasoning steps. For example, in deductive reasoning tasks, presenting the premises in the same order as the ground truth proof in the prompt (as opposed to random ordering) drastically increases the model's accuracy. We first examine the effect of premise ordering on deductive reasoning on a variety of LLMs, and our evaluation shows that permuting the premise order can cause a performance drop of over 30%. In addition, we release the benchmark R-GSM, based on GSM8K, to examine the ordering effect for mathematical problem-solving, and we again observe a significant drop in accuracy, relative to the original GSM8K benchmark.
- [218] arXiv:2402.08955 [ pdf , ps , html , other ]
-
Title: Using Counterfactual Tasks to Evaluate the Generality of Analogical Reasoning in Large Language ModelsSubjects: Artificial Intelligence (cs.AI) ; Computation and Language (cs.CL)
Abstract: Large language models (LLMs) have performed well on several reasoning benchmarks, including ones that test analogical reasoning abilities. However, it has been debated whether they are actually performing humanlike abstract reasoning or instead employing less general processes that rely on similarity to what has been seen in their training data. Here we investigate the generality of analogy-making abilities previously claimed for LLMs (Webb, Holyoak, & Lu, 2023). We take one set of analogy problems used to evaluate LLMs and create a set of "counterfactual" variants-versions that test the same abstract reasoning abilities but that are likely dissimilar from any pre-training data. We test humans and three GPT models on both the original and counterfactual problems, and show that, while the performance of humans remains high for all the problems, the GPT models' performance declines sharply on the counterfactual set. This work provides evidence that, despite previously reported successes of LLMs on analogical reasoning, these models lack the robustness and generality of human analogy-making.
- [219] arXiv:2402.08957 [ pdf , ps , html , other ]
-
Title: MUSTARD: Mastering Uniform Synthesis of Theorem and Proof DataYinya Huang , Xiaohan Lin , Zhengying Liu , Qingxing Cao , Huajian Xin , Haiming Wang , Zhenguo Li , Linqi Song , Xiaodan LiangJournal-ref: ICLR 2024 spotlightSubjects: Artificial Intelligence (cs.AI) ; Computation and Language (cs.CL); Formal Languages and Automata Theory (cs.FL); Machine Learning (cs.LG); Programming Languages (cs.PL)
Abstract: Recent large language models (LLMs) have witnessed significant advancement in various tasks, including mathematical reasoning and theorem proving. As these two tasks require strict and formal multi-step inference, they are appealing domains for exploring the reasoning ability of LLMs but still face important challenges. Previous studies such as Chain-of-Thought (CoT) have revealed the effectiveness of intermediate steps guidance. However, such step-wise annotation requires heavy labor, leading to insufficient training steps for current benchmarks. To fill this gap, this work introduces MUSTARD, a data generation framework that masters uniform synthesis of theorem and proof data of high quality and diversity. MUSTARD synthesizes data in three stages: (1) It samples a few mathematical concept seeds as the problem category. (2) Then, it prompts a generative language model with the sampled concepts to obtain both the problems and their step-wise formal solutions. (3) Lastly, the framework utilizes a proof assistant (e.g., Lean Prover) to filter the valid proofs. With the proposed MUSTARD, we present a theorem-and-proof benchmark MUSTARDSAUCE with 5,866 valid data points. Each data point contains an informal statement, an informal proof, and a translated formal proof that passes the prover validation. We perform extensive analysis and demonstrate that MUSTARD generates validated high-quality step-by-step data. We further apply the MUSTARDSAUCE for fine-tuning smaller language models. The fine-tuned Llama 2-7B achieves a 15.41% average relative performance gain in automated theorem proving, and 8.18% in math word problems. Codes and data are available at this https URL .
- [220] arXiv:2402.08961 [ pdf , ps , html , other ]
-
Title: HyCubE: Efficient Knowledge Hypergraph 3D Circular Convolutional EmbeddingComments: 11 pages, 5 figuresSubjects: Artificial Intelligence (cs.AI)
Abstract: Existing knowledge hypergraph embedding methods mainly focused on improving model performance, but their model structures are becoming more complex and redundant. Furthermore, due to the inherent complex semantic knowledge, the computation of knowledge hypergraph embedding models is often very expensive, leading to low efficiency. In this paper, we propose a feature interaction and extraction-enhanced 3D circular convolutional embedding model, HyCubE, which designs a novel 3D circular convolutional neural network and introduces the alternate mask stack strategy to achieve efficient n-ary knowledge hypergraph embedding. By adaptively adjusting the 3D circular convolution kernel size and uniformly embedding the entity position information, HyCubE improves the model performance with fewer parameters and reaches a better trade-off between model performance and efficiency. In addition, we use 1-N multilinear scoring based on the entity mask mechanism to further accelerate the model training efficiency. Finally, extensive experimental results on all datasets demonstrate that HyCubE consistently outperforms state-of-the-art baselines, with an average improvement of 4.08%-10.77% and a maximum improvement of 21.16% across all metrics. Commendably, HyCubE speeds up by an average of 7.55x and reduces memory usage by an average of 77.02% compared to the latest state-of-the-art baselines.
- [221] arXiv:2402.08968 [ pdf , ps , html , other ]
-
Title: GrounDial: Human-norm Grounded Safe Dialog Response GenerationComments: Accepted to findings of EACL 2024Subjects: Artificial Intelligence (cs.AI)
Abstract: Current conversational AI systems based on large language models (LLMs) are known to generate unsafe responses, agreeing to offensive user input or including toxic content. Previous research aimed to alleviate the toxicity, by fine-tuning LLM with manually annotated safe dialogue histories. However, the dependency on additional tuning requires substantial costs. To remove the dependency, we propose GrounDial, where response safety is achieved by grounding responses to commonsense social rules without requiring fine-tuning. A hybrid approach of in-context learning and human-norm-guided decoding of GrounDial enables the response to be quantitatively and qualitatively safer even without additional data or tuning.
- [222] arXiv:2402.09046 [ pdf , ps , html , other ]
-
Title: Inference of Abstraction for a Unified Account of Reasoning and LearningComments: arXiv admin note: substantial text overlap with arXiv:2402.08646Subjects: Artificial Intelligence (cs.AI) ; Machine Learning (cs.LG); Logic in Computer Science (cs.LO)
Abstract: Inspired by Bayesian approaches to brain function in neuroscience, we give a simple theory of probabilistic inference for a unified account of reasoning and learning. We simply model how data cause symbolic knowledge in terms of its satisfiability in formal logic. The underlying idea is that reasoning is a process of deriving symbolic knowledge from data via abstraction, i.e., selective ignorance. The logical consequence relation is discussed for its proof-based theoretical correctness. The MNIST dataset is discussed for its experiment-based empirical correctness.
- [223] arXiv:2402.09047 [ pdf , ps , html , other ]
-
Title: FGeo-TP: A Language Model-Enhanced Solver for Geometry ProblemsComments: 16 pagesSubjects: Artificial Intelligence (cs.AI)
Abstract: The application of contemporary artificial intelligence techniques to address geometric problems and automated deductive proof has always been a grand challenge to the interdiscipline field of mathematics and artificial Intelligence. This is the fourth article in a series of our works, in our previous work, we established of a geometric formalized system known as FormalGeo. Moreover we annotated approximately 7000 geometric problems, forming the FormalGeo7k dataset. Despite the FGPS (Formal Geometry Problem Solver) can achieve interpretable algebraic equation solving and human-like deductive reasoning, it often experiences timeouts due to the complexity of the search strategy. In this paper, we introduced FGeo-TP (Theorem Predictor), which utilizes the language model to predict theorem sequences for solving geometry problems. We compared the effectiveness of various Transformer architectures, such as BART or T5, in theorem prediction, implementing pruning in the search process of FGPS, thereby improving its performance in solving geometry problems. Our results demonstrate a significant increase in the problem-solving rate of the language model-enhanced FGeo-TP on the FormalGeo7k dataset, rising from 39.7% to 80.86%. Furthermore, FGeo-TP exhibits notable reductions in solving time and search steps across problems of varying difficulty levels.
- [224] arXiv:2402.09051 [ pdf , ps , other ]
-
Title: FGeo-DRL: Deductive Reasoning for Geometric Problems through Deep Reinforcement LearningComments: 15 pagesSubjects: Artificial Intelligence (cs.AI)
Abstract: The human-like automatic deductive reasoning has always been one of the most challenging open problems in the interdiscipline of mathematics and artificial intelligence. This paper is the third in a series of our works. We built a neural-symbolic system, called FGeoDRL, to automatically perform human-like geometric deductive reasoning. The neural part is an AI agent based on reinforcement learning, capable of autonomously learning problem-solving methods from the feedback of a formalized environment, without the need for human supervision. It leverages a pre-trained natural language model to establish a policy network for theorem selection and employ Monte Carlo Tree Search for heuristic exploration. The symbolic part is a reinforcement learning environment based on geometry formalization theory and FormalGeo, which models GPS as a Markov Decision Process. In this formal symbolic system, the known conditions and objectives of the problem form the state space, while the set of theorems forms the action space. Leveraging FGeoDRL, we have achieved readable and verifiable automated solutions to geometric problems. Experiments conducted on the formalgeo7k dataset have achieved a problem-solving success rate of 86.40%. The project is available at this https URL .
- [225] arXiv:2402.09052 [ pdf , ps , other ]
-
Title: L3GO: Language Agents with Chain-of-3D-Thoughts for Generating Unconventional ObjectsSubjects: Artificial Intelligence (cs.AI)
Abstract: Diffusion-based image generation models such as DALL-E 3 and Stable Diffusion-XL demonstrate remarkable capabilities in generating images with realistic and unique compositions. Yet, these models are not robust in precisely reasoning about physical and spatial configurations of objects, especially when instructed with unconventional, thereby out-of-distribution descriptions, such as "a chair with five legs". In this paper, we propose a language agent with chain-of-3D-thoughts (L3GO), an inference-time approach that can reason about part-based 3D mesh generation of unconventional objects that current data-driven diffusion models struggle with. More concretely, we use large language models as agents to compose a desired object via trial-and-error within the 3D simulation environment. To facilitate our investigation, we develop a new benchmark, Unconventionally Feasible Objects (UFO), as well as SimpleBlenv, a wrapper environment built on top of Blender where language agents can build and compose atomic building blocks via API calls. Human and automatic GPT-4V evaluations show that our approach surpasses the standard GPT-4 and other language agents (e.g., ReAct and Reflexion) for 3D mesh generation on ShapeNet. Moreover, when tested on our UFO benchmark, our approach outperforms other state-of-the-art text-to-2D image and text-to-3D models based on human evaluation.
- [226] arXiv:2402.09056 [ pdf , ps , html , other ]
-
Title: Is Epistemic Uncertainty Faithfully Represented by Evidential Deep Learning Methods?Subjects: Artificial Intelligence (cs.AI) ; Machine Learning (cs.LG)
Abstract: Trustworthy ML systems should not only return accurate predictions, but also a reliable representation of their uncertainty. Bayesian methods are commonly used to quantify both aleatoric and epistemic uncertainty, but alternative approaches, such as evidential deep learning methods, have become popular in recent years. The latter group of methods in essence extends empirical risk minimization (ERM) for predicting second-order probability distributions over outcomes, from which measures of epistemic (and aleatoric) uncertainty can be extracted. This paper presents novel theoretical insights of evidential deep learning, highlighting the difficulties in optimizing second-order loss functions and interpreting the resulting epistemic uncertainty measures. With a systematic setup that covers a wide range of approaches for classification, regression and counts, it provides novel insights into issues of identifiability and convergence in second-order loss minimization, and the relative (rather than absolute) nature of epistemic uncertainty measures.
- [227] arXiv:2402.09085 [ pdf , ps , html , other ]
-
Title: Polynomial Semantics of Tractable Probabilistic CircuitsSubjects: Artificial Intelligence (cs.AI)
Abstract: Probabilistic circuits compute multilinear polynomials that represent multivariate probability distributions. They are tractable models that support efficient marginal inference. However, various polynomial semantics have been considered in the literature (e.g., network polynomials, likelihood polynomials, generating functions, and Fourier transforms). The relationships between circuit representations of these polynomial encodings of distributions is largely unknown. In this paper, we prove that for distributions over binary variables, each of these probabilistic circuit models is equivalent in the sense that any circuit for one of them can be transformed into a circuit for any of the others with only a polynomial increase in size. They are therefore all tractable for marginal inference on the same class of distributions. Finally, we explore the natural extension of one such polynomial semantics, called probabilistic generating circuits, to categorical random variables, and establish that inference becomes #P-hard.
- [228] arXiv:2402.09099 [ pdf , ps , html , other ]
-
Title: Exploring Neuron Interactions and Emergence in LLMs: From the Multifractal Analysis PerspectiveXiongye Xiao , Chenyu Zhou , Heng Ping , Defu Cao , Yaxing Li , Yizhuo Zhou , Shixuan Li , Paul BogdanSubjects: Artificial Intelligence (cs.AI)
Abstract: Prior studies on the emergence in large models have primarily focused on how the functional capabilities of large language models (LLMs) scale with model size. Our research, however, transcends this traditional paradigm, aiming to deepen our understanding of the emergence within LLMs by placing a special emphasis not just on the model size but more significantly on the complex behavior of neuron interactions during the training process. By introducing the concepts of "self-organization" and "multifractal analysis," we explore how neuron interactions dynamically evolve during training, leading to "emergence," mirroring the phenomenon in natural systems where simple micro-level interactions give rise to complex macro-level behaviors. To quantitatively analyze the continuously evolving interactions among neurons in large models during training, we propose the Neuron-based Multifractal Analysis (NeuroMFA). Utilizing NeuroMFA, we conduct a comprehensive examination of the emergent behavior in LLMs through the lens of both model size and training process, paving new avenues for research into the emergence in large models.
- [229] arXiv:2402.09132 [ pdf , ps , html , other ]
-
Title: Exploring the Adversarial Capabilities of Large Language ModelsSubjects: Artificial Intelligence (cs.AI) ; Machine Learning (cs.LG)
Abstract: The proliferation of large language models (LLMs) has sparked widespread and general interest due to their strong language generation capabilities, offering great potential for both industry and research. While previous research delved into the security and privacy issues of LLMs, the extent to which these models can exhibit adversarial behavior remains largely unexplored. Addressing this gap, we investigate whether common publicly available LLMs have inherent capabilities to perturb text samples to fool safety measures, so-called adversarial examples resp.~attacks. More specifically, we investigate whether LLMs are inherently able to craft adversarial examples out of benign samples to fool existing safe rails. Our experiments, which focus on hate speech detection, reveal that LLMs succeed in finding adversarial perturbations, effectively undermining hate speech detection systems. Our findings carry significant implications for (semi-)autonomous systems relying on LLMs, highlighting potential challenges in their interaction with existing systems and safety measures.
- [230] arXiv:2402.09147 [ pdf , ps , other ]
-
Title: Into the Unknown: Self-Learning Large Language ModelsComments: 14 pages, 13 figures, to be submitted to ACL 2024Subjects: Artificial Intelligence (cs.AI)
Abstract: We address the main problem of self-learning LLM: the question of what to learn. We propose a self-learning LLM framework that enables an LLM to independently learn previously unknown knowledge through self-assessment of their own hallucinations. Using the hallucination score, we introduce a new concept of Points in The Unknown (PiUs), along with one extrinsic and three intrinsic methods for automatic PiUs identification. It facilitates the creation of a self-learning loop that focuses exclusively on the knowledge gap in Points in The Unknown, resulting in a reduced hallucination score. We also developed evaluation metrics for gauging an LLM's self-learning capability. Our experiments revealed that 7B-Mistral models that have been finetuned or aligned are capable of self-learning considerably well. Our self-learning concept allows more efficient LLM updates and opens new perspectives for knowledge exchange. It may also increase public trust in AI.
- [231] arXiv:2402.09161 [ pdf , ps , other ]
-
Title: Role-Playing Simulation Games using ChatGPTComments: Link to online article: this https URLJournal-ref: ERCIM News Special Theme: Large Language Models 2024Subjects: Artificial Intelligence (cs.AI) ; Human-Computer Interaction (cs.HC)
Abstract: Since the COVID-19 pandemic, educational institutions have embarked on digital transformation projects. The success of these projects depends on integrating new technologies and understanding the needs of digitally literate students. The "learning by doing" approach suggests that real success in learning new skills is achieved when students can try out and practise these skills. In this article, we demonstrate how Large Language Models (LLMs) can enhance the quality of teaching by using ChatGPT in a role-playing simulation game scenario to promote active learning. Moreover, we discuss how LLMs can boost students' interest in learning by allowing them to practice real-life scenarios using ChatGPT.
- [232] arXiv:2402.09221 [ pdf , ps , html , other ]
-
Title: Spectral Filters, Dark Signals, and Attention SinksSubjects: Artificial Intelligence (cs.AI) ; Computation and Language (cs.CL)
Abstract: Projecting intermediate representations onto the vocabulary is an increasingly popular interpretation tool for transformer-based LLMs, also known as the logit lens. We propose a quantitative extension to this approach and define spectral filters on intermediate representations based on partitioning the singular vectors of the vocabulary embedding and unembedding matrices into bands. We find that the signals exchanged in the tail end of the spectrum are responsible for attention sinking (Xiao et al. 2023), of which we provide an explanation. We find that the loss of pretrained models can be kept low despite suppressing sizable parts of the embedding spectrum in a layer-dependent way, as long as attention sinking is preserved. Finally, we discover that the representation of tokens that draw attention from many tokens have large projections on the tail end of the spectrum.
- [233] arXiv:2402.09266 [ pdf , ps , html , other ]
-
Title: Machine Learning in management of precautionary closures caused by lipophilic biotoxinsJournal-ref: Computers and Electronics in Agriculture, 197, 106956. (2022)Subjects: Artificial Intelligence (cs.AI)
Abstract: Mussel farming is one of the most important aquaculture industries. The main risk to mussel farming is harmful algal blooms (HABs), which pose a risk to human consumption. In Galicia, the Spanish main producer of cultivated mussels, the opening and closing of the production areas is controlled by a monitoring program. In addition to the closures resulting from the presence of toxicity exceeding the legal threshold, in the absence of a confirmatory sampling and the existence of risk factors, precautionary closures may be applied. These decisions are made by experts without the support or formalisation of the experience on which they are based. Therefore, this work proposes a predictive model capable of supporting the application of precautionary closures. Achieving sensitivity, accuracy and kappa index values of 97.34%, 91.83% and 0.75 respectively, the kNN algorithm has provided the best results. This allows the creation of a system capable of helping in complex situations where forecast errors are more common.
- [234] arXiv:2402.09286 [ pdf , ps , other ]
-
Title: Nutrition Facts, Drug Facts, and Model Facts: Putting AI Ethics into Practice in Gun Violence ResearchSubjects: Artificial Intelligence (cs.AI) ; Machine Learning (cs.LG)
Abstract: Objective: Firearm injury research necessitates using data from often-exploited vulnerable populations of Black and Brown Americans. In order to minimize distrust, this study provides a framework for establishing AI trust and transparency with the general population. Methods: We propose a Model Facts template that is easily extendable and decomposes accuracy and demographics into standardized and minimally complex values. This framework allows general users to assess the validity and biases of a model without diving into technical model documentation. Examples: We apply the Model Facts template on two previously published models, a violence risk identification model and a suicide risk prediction model. We demonstrate the ease of accessing the appropriate information when the data is structured appropriately. Discussion: The Model Facts template is limited in its current form to human based data and biases. Like nutrition facts, it also will require some educational resources for users to grasp its full utility. Human computer interaction experiments should be conducted to ensure that the interaction between user interface and model interface is as desired. Conclusion: The Model Facts label is the first framework dedicated to establishing trust with end users and general population consumers. Implementation of Model Facts into firearm injury research will provide public health practitioners and those impacted by firearm injury greater faith in the tools the research provides.
- [235] arXiv:2402.09334 [ pdf , ps , html , other ]
-
Title: AuditLLM: A Tool for Auditing Large Language Models Using Multiprobe ApproachSubjects: Artificial Intelligence (cs.AI)
Abstract: As Large Language Models (LLMs) gain wider adoption in various contexts, it becomes crucial to ensure they are reasonably safe, consistent, and reliable for an application at hand. This may require probing or auditing them. Probing LLMs with varied iterations of a single question could reveal potential inconsistencies in their knowledge or functionality. However, a tool for performing such audits with simple workflow and low technical threshold is lacking. In this demo, we introduce "AuditLLM," a novel tool designed to evaluate the performance of various LLMs in a methodical way. AuditLLM's core functionality lies in its ability to test a given LLM by auditing it using multiple probes generated from a single question, thereby identifying any inconsistencies in the model's understanding or operation. A reasonably robust, reliable, and consistent LLM should output semantically similar responses for a question asked differently or by different people. Based on this assumption, AuditLLM produces easily interpretable results regarding the LLM's consistencies from a single question that the user enters. A certain level of inconsistency has been shown to be an indicator of potential bias, hallucinations, and other issues. One could then use the output of AuditLLM to further investigate issues with the aforementioned LLM. To facilitate demonstration and practical uses, AuditLLM offers two key modes: (1) Live mode which allows instant auditing of LLMs by analyzing responses to real-time queries; (2) Batch mode which facilitates comprehensive LLM auditing by processing multiple queries at once for in-depth analysis. This tool is beneficial for both researchers and general users, as it enhances our understanding of LLMs' capabilities in generating responses, using a standardized auditing platform.
- [236] arXiv:2402.09346 [ pdf , ps , html , other ]
-
Title: Developing a Framework for Auditing Large Language Models Using Human-in-the-LoopMaryam Amirizaniani , Jihan Yao , Adrian Lavergne , Elizabeth Snell Okada , Aman Chadha , Tanya Roosta , Chirag ShahSubjects: Artificial Intelligence (cs.AI)
Abstract: As LLMs become more pervasive across various users and scenarios, identifying potential issues when using these models becomes essential. Examples include bias, inconsistencies, and hallucination. Although auditing the LLM for these problems is desirable, it is far from being easy or solved. An effective method is to probe the LLM using different versions of the same question. This could expose inconsistencies in its knowledge or operation, indicating potential for bias or hallucination. However, to operationalize this auditing method at scale, we need an approach to create those probes reliably and automatically. In this paper we propose an automatic and scalable solution, where one uses a different LLM along with human-in-the-loop. This approach offers verifiability and transparency, while avoiding circular reliance on the same LLMs, and increasing scientific rigor and generalizability. Specifically, we present a novel methodology with two phases of verification using humans: standardized evaluation criteria to verify responses, and a structured prompt template to generate desired probes. Experiments on a set of questions from TruthfulQA dataset show that we can generate a reliable set of probes from one LLM that can be used to audit inconsistencies in a different LLM. The criteria for generating and applying auditing probes is generalizable to various LLMs regardless of the underlying structure or training mechanism.
- [237] arXiv:2402.09358 [ pdf , ps , other ]
-
Title: Integrating ChatGPT into Secure Hospital Networks: A Case Study on Improving Radiology Report AnalysisSubjects: Artificial Intelligence (cs.AI) ; Machine Learning (cs.LG)
Abstract: This study demonstrates the first in-hospital adaptation of a cloud-based AI, similar to ChatGPT, into a secure model for analyzing radiology reports, prioritizing patient data privacy. By employing a unique sentence-level knowledge distillation method through contrastive learning, we achieve over 95% accuracy in detecting anomalies. The model also accurately flags uncertainties in its predictions, enhancing its reliability and interpretability for physicians with certainty indicators. These advancements represent significant progress in developing secure and efficient AI tools for healthcare, suggesting a promising future for in-hospital AI applications with minimal supervision.
- [238] arXiv:2402.09388 [ pdf , ps , other ]
-
Title: Entropy-regularized Point-based Value IterationHarrison Delecki , Marcell Vazquez-Chanlatte , Esen Yel , Kyle Wray , Tomer Arnon , Stefan Witwicki , Mykel J. KochenderferSubjects: Artificial Intelligence (cs.AI)
Abstract: Model-based planners for partially observable problems must accommodate both model uncertainty during planning and goal uncertainty during objective inference. However, model-based planners may be brittle under these types of uncertainty because they rely on an exact model and tend to commit to a single optimal behavior. Inspired by results in the model-free setting, we propose an entropy-regularized model-based planner for partially observable problems. Entropy regularization promotes policy robustness for planning and objective inference by encouraging policies to be no more committed to a single action than necessary. We evaluate the robustness and objective inference performance of entropy-regularized policies in three problem domains. Our results show that entropy-regularized policies outperform non-entropy-regularized baselines in terms of higher expected returns under modeling errors and higher accuracy during objective inference.
- [239] arXiv:2402.09390 [ pdf , ps , other ]
-
Title: HGOT: Hierarchical Graph of Thoughts for Retrieval-Augmented In-Context Learning in Factuality EvaluationSubjects: Artificial Intelligence (cs.AI) ; Computation and Language (cs.CL)
Abstract: With the widespread adoption of large language models (LLMs) in numerous applications, the challenge of factuality and the propensity for hallucinations raises significant concerns. To address this issue, particularly in retrieval-augmented in-context learning, we introduce the hierarchical graph of thoughts (HGOT), a structured, multi-layered graph approach designed to enhance the retrieval of pertinent passages during in-context learning. The framework utilizes the emergent planning capabilities of LLMs, employing the divide-and-conquer strategy to break down complex queries into manageable sub-queries. It refines self-consistency majority voting for answer selection, which incorporates the recently proposed citation recall and precision metrics to assess the quality of thoughts, linking an answer's credibility intrinsically to the thought's quality. This methodology introduces a weighted system in majority voting, prioritizing answers based on the citation quality of their thoughts. Additionally, we propose a scoring mechanism for evaluating retrieved passages, considering factors such as citation frequency and quality, self-consistency confidence, and the retrieval module's ranking. Experiments reveal that HGOT outperforms other retrieval-augmented in-context learning methods, including Demonstrate-Search-Predict (DSP), ReAct, Self-Ask, and Retrieve-then-Read on different datasets by as much as $7\%$, demonstrating its efficacy in enhancing the factuality of LLMs.
- [240] arXiv:2402.09391 [ pdf , ps , html , other ]
-
Title: LlaSMol: Advancing Large Language Models for Chemistry with a Large-Scale, Comprehensive, High-Quality Instruction Tuning DatasetComments: Added further analysis experiments. Work in progressSubjects: Artificial Intelligence (cs.AI) ; Computational Engineering, Finance, and Science (cs.CE); Computation and Language (cs.CL)
Abstract: Chemistry plays a crucial role in many domains, such as drug discovery and material science. While large language models (LLMs) such as GPT-4 exhibit remarkable capabilities on natural language processing tasks, existing research indicates that their performance on chemistry tasks is discouragingly low. In this paper, however, we demonstrate that our developed LLMs can achieve very strong results on a comprehensive set of chemistry tasks, outperforming the most advanced GPT-4 and Claude 3 Opus by a substantial margin. To accomplish this, we propose SMolInstruct, a large-scale, comprehensive, and high-quality dataset for instruction tuning. It contains 14 selected chemistry tasks and over three million samples, laying a solid foundation for training and evaluating LLMs for chemistry. Using SMolInstruct, we fine-tune a set of open-source LLMs, among which, we find that Mistral serves as the best base model for chemistry tasks. Our analysis further demonstrates the critical role of the proposed dataset in driving the performance improvements.
- [241] arXiv:2402.09413 [ pdf , ps , html , other ]
-
Title: Mathematical ExplanationsSubjects: Artificial Intelligence (cs.AI)
Abstract: A definition of what counts as an explanation of mathematical statement, and when one explanation is better than another, is given. Since all mathematical facts must be true in all causal models, and hence known by an agent, mathematical facts cannot be part of an explanation (under the standard notion of explanation). This problem is solved using impossible possible worlds.
- [242] arXiv:2402.09498 [ pdf , ps , other ]
-
Title: Detection of the most influential variables for preventing postpartum urinary incontinence using machine learning techniquesJosé Alberto Benítez-Andrades , María Teresa García-Ordás , María Álvarez-González , Raquel Leirós-Rodríguez , Ana F López RodríguezJournal-ref: Digital Health, Volume 8, 2022, 20552076221111289Subjects: Artificial Intelligence (cs.AI)
Abstract: Background: Postpartum urinary incontinence (PUI) is a common issue among postnatal women. Previous studies identified potential related variables, but lacked analysis on certain intrinsic and extrinsic patient variables during pregnancy.
Objective: The study aims to evaluate the most influential variables in PUI using machine learning, focusing on intrinsic, extrinsic, and combined variable groups.
Methods: Data from 93 pregnant women were analyzed using machine learning and oversampling techniques. Four key variables were predicted: occurrence, frequency, intensity of urinary incontinence, and stress urinary incontinence.
Results: Models using extrinsic variables were most accurate, with 70% accuracy for urinary incontinence, 77% for frequency, 71% for intensity, and 93% for stress urinary incontinence.
Conclusions: The study highlights extrinsic variables as significant predictors of PUI issues. This suggests that PUI prevention might be achievable through healthy habits during pregnancy, although further research is needed for confirmation. - [243] arXiv:2402.09500 [ pdf , ps , html , other ]
-
Title: On Formally Undecidable Traits of Intelligent MachinesComments: 34 pagesSubjects: Artificial Intelligence (cs.AI) ; Logic in Computer Science (cs.LO)
Abstract: Building on work by Alfonseca et al. (2021), we study the conditions necessary for it to be logically possible to prove that an arbitrary artificially intelligent machine will exhibit certain behavior. To do this, we develop a formalism like -- but mathematically distinct from -- the theory of formal languages and their properties. Our formalism affords a precise means for not only talking about the traits we desire of machines (such as them being intelligent, contained, moral, and so forth), but also for detailing the conditions necessary for it to be logically possible to decide whether a given arbitrary machine possesses such a trait or not. Contrary to Alfonseca et al.'s (2021) results, we find that Rice's theorem from computability theory cannot in general be used to determine whether an arbitrary machine possesses a given trait or not. Therefore, it is not necessarily the case that deciding whether an arbitrary machine is intelligent, contained, moral, and so forth is logically impossible.
- [244] arXiv:2402.09553 [ pdf , ps , html , other ]
-
Title: Statistical and Machine Learning Models for Predicting Fire and Other Emergency EventsDilli Prasad Sharma , Nasim Beigi-Mohammadi , Hongxiang Geng , Dawn Dixon , Rob Madro , Phil Emmenegger , Carlos Tobar , Jeff Li , Alberto Leon-GarciaJournal-ref: IEEE Access 12(2024) 56880-56909Subjects: Artificial Intelligence (cs.AI) ; Machine Learning (cs.LG); Machine Learning (stat.ML)
Abstract: Emergency events in a city cause considerable economic loss to individuals, their families, and the community. Accurate and timely prediction of events can help the emergency fire and rescue services in preparing for and mitigating the consequences of emergency events. In this paper, we present a systematic development of predictive models for various types of emergency events in the City of Edmonton, Canada. We present methods for (i) data collection and dataset development; (ii) descriptive analysis of each event type and its characteristics at different spatiotemporal levels; (iii) feature analysis and selection based on correlation coefficient analysis and feature importance analysis; and (iv) development of prediction models for the likelihood of occurrence of each event type at different temporal and spatial resolutions. We analyze the association of event types with socioeconomic and demographic data at the neighborhood level, identify a set of predictors for each event type, and develop predictive models with negative binomial regression. We conduct evaluations at neighborhood and fire station service area levels. Our results show that the models perform well for most of the event types with acceptable prediction errors for weekly and monthly periods. The evaluation shows that the prediction accuracy is consistent at the level of the fire station, so the predictions can be used in management by fire rescue service departments for planning resource allocation for these time periods. We also examine the impact of the COVID-19 pandemic on the occurrence of events and on the accuracy of event predictor models. Our findings show that COVID-19 had a significant impact on the performance of the event prediction models.
- [245] arXiv:2402.09558 [ pdf , ps , html , other ]
-
Title: Bidirectional Generative Pre-training for Improving Time Series Representation LearningSubjects: Artificial Intelligence (cs.AI) ; Machine Learning (cs.LG)
Abstract: Learning time-series representations for discriminative tasks has been a long-standing challenge. Current pre-training methods are limited in either unidirectional next-token prediction or randomly masked token prediction. We propose a novel architecture called Bidirectional Timely Generative Pre-trained Transformer (BiTimelyGPT), which pre-trains on time-series data by both next-token and previous-token predictions in alternating transformer layers. This pre-training task preserves original distribution and data shapes of the time-series. Additionally, the full-rank forward and backward attention matrices exhibit more expressive representation capabilities. Using biosignal data, BiTimelyGPT demonstrates superior performance in predicting neurological functionality, disease diagnosis, and physiological signs. By visualizing the attention heatmap, we observe that the pre-trained BiTimelyGPT can identify discriminative segments from time-series sequences, even more so after fine-tuning on the task.
- [246] arXiv:2402.09565 [ pdf , ps , html , other ]
-
Title: Graph-Skeleton: ~1% Nodes are Sufficient to Represent Billion-Scale GraphComments: 21 pages, 11 figures, In Proceedings of the ACM Web Conference 2024 (WWW'24)Subjects: Artificial Intelligence (cs.AI)
Abstract: Due to the ubiquity of graph data on the web, web graph mining has become a hot research spot. Nonetheless, the prevalence of large-scale web graphs in real applications poses significant challenges to storage, computational capacity and graph model design. Despite numerous studies to enhance the scalability of graph models, a noticeable gap remains between academic research and practical web graph mining applications. One major cause is that in most industrial scenarios, only a small part of nodes in a web graph are actually required to be analyzed, where we term these nodes as target nodes, while others as background nodes. In this paper, we argue that properly fetching and condensing the background nodes from massive web graph data might be a more economical shortcut to tackle the obstacles fundamentally. To this end, we make the first attempt to study the problem of massive background nodes compression for target nodes classification. Through extensive experiments, we reveal two critical roles played by the background nodes in target node classification: enhancing structural connectivity between target nodes, and feature correlation with target nodes. Followingthis, we propose a novel Graph-Skeleton1 model, which properly fetches the background nodes, and further condenses the semantic and topological information of background nodes within similar target-background local structures. Extensive experiments on various web graph datasets demonstrate the effectiveness and efficiency of the proposed method. In particular, for MAG240M dataset with 0.24 billion nodes, our generated skeleton graph achieves highly comparable performance while only containing 1.8% nodes of the original graph.
- [247] arXiv:2402.09584 [ pdf , ps , other ]
-
Title: Large Language Model-Based Interpretable Machine Learning Control in Building Energy SystemsSubjects: Artificial Intelligence (cs.AI) ; Human-Computer Interaction (cs.HC)
Abstract: The potential of Machine Learning Control (MLC) in HVAC systems is hindered by its opaque nature and inference mechanisms, which is challenging for users and modelers to fully comprehend, ultimately leading to a lack of trust in MLC-based decision-making. To address this challenge, this paper investigates and explores Interpretable Machine Learning (IML), a branch of Machine Learning (ML) that enhances transparency and understanding of models and their inferences, to improve the credibility of MLC and its industrial application in HVAC systems. Specifically, we developed an innovative framework that combines the principles of Shapley values and the in-context learning feature of Large Language Models (LLMs). While the Shapley values are instrumental in dissecting the contributions of various features in ML models, LLM provides an in-depth understanding of rule-based parts in MLC; combining them, LLM further packages these insights into a coherent, human-understandable narrative. The paper presents a case study to demonstrate the feasibility of the developed IML framework for model predictive control-based precooling under demand response events in a virtual testbed. The results indicate that the developed framework generates and explains the control signals in accordance with the rule-based rationale.
- [248] arXiv:2402.09588 [ pdf , ps , html , other ]
-
Title: Emerging Opportunities of Using Large Language Models for Translation Between Drug Molecules and IndicationsDavid Oniani , Jordan Hilsman , Chengxi Zang , Junmei Wang , Lianjin Cai , Jan Zawala , Yanshan WangSubjects: Artificial Intelligence (cs.AI) ; Computation and Language (cs.CL)
Abstract: A drug molecule is a substance that changes the organism's mental or physical state. Every approved drug has an indication, which refers to the therapeutic use of that drug for treating a particular medical condition. While the Large Language Model (LLM), a generative Artificial Intelligence (AI) technique, has recently demonstrated effectiveness in translating between molecules and their textual descriptions, there remains a gap in research regarding their application in facilitating the translation between drug molecules and indications, or vice versa, which could greatly benefit the drug discovery process. The capability of generating a drug from a given indication would allow for the discovery of drugs targeting specific diseases or targets and ultimately provide patients with better treatments. In this paper, we first propose a new task, which is the translation between drug molecules and corresponding indications, and then test existing LLMs on this new task. Specifically, we consider nine variations of the T5 LLM and evaluate them on two public datasets obtained from ChEMBL and DrugBank. Our experiments show the early results of using LLMs for this task and provide a perspective on the state-of-the-art. We also emphasize the current limitations and discuss future work that has the potential to improve the performance on this task. The creation of molecules from indications, or vice versa, will allow for more efficient targeting of diseases and significantly reduce the cost of drug discovery, with the potential to revolutionize the field of drug discovery in the era of generative AI.
- [249] arXiv:2402.09592 [ pdf , ps , other ]
-
Title: A Web-Based Tool for Automatic Data Collection, Curation, and Visualization of Complex Healthcare Survey Studies including Social Network AnalysisJosé Alberto Benítez-Andrades , José Emilio Labra , Enedina Quiroga , Vicente Martín , Isaías García , Pilar Marqués-Sánchez , Carmen BenavidesJournal-ref: Computation and Mathematical Methods in Medicine, Volume 2017, Article ID 2579848Subjects: Artificial Intelligence (cs.AI) ; Human-Computer Interaction (cs.HC)
Abstract: There is a great concern nowadays regarding alcohol consumption and drug abuse, especially in young people. Analyzing the social environment where these adolescents are immersed, as well as a series of measures determining the alcohol abuse risk or personal situation and perception using a number of questionnaires like AUDIT, FAS, KIDSCREEN, and others, it is possible to gain insight into the current situation of a given individual regarding his/her consumption behavior. But this analysis, in order to be achieved, requires the use of tools that can ease the process of questionnaire creation, data gathering, curation and representation, and later analysis and visualization to the user. This research presents the design and construction of a web-based platform able to facilitate each of the mentioned processes by integrating the different phases into an intuitive system with a graphical user interface that hides the complexity underlying each of the questionnaires and techniques used and presenting the results in a flexible and visual way, avoiding any manual handling of data during the process. Advantages of this approach are shown and compared to the previous situation where some of the tasks were accomplished by time consuming and error prone manipulations of data.
- [250] arXiv:2402.09617 [ pdf , ps , html , other ]
-
Title: LLM-Enhanced User-Item Interactions: Leveraging Edge Information for Optimized RecommendationsSubjects: Artificial Intelligence (cs.AI) ; Information Retrieval (cs.IR)
Abstract: The extraordinary performance of large language models has not only reshaped the research landscape in the field of NLP but has also demonstrated its exceptional applicative potential in various domains. However, the potential of these models in mining relationships from graph data remains under-explored. Graph neural networks, as a popular research area in recent years, have numerous studies on relationship mining. Yet, current cutting-edge research in graph neural networks has not been effectively integrated with large language models, leading to limited efficiency and capability in graph relationship mining tasks. A primary challenge is the inability of LLMs to deeply exploit the edge information in graphs, which is critical for understanding complex node relationships. This gap limits the potential of LLMs to extract meaningful insights from graph structures, limiting their applicability in more complex graph-based analysis. We focus on how to utilize existing LLMs for mining and understanding relationships in graph data, applying these techniques to recommendation tasks. We propose an innovative framework that combines the strong contextual representation capabilities of LLMs with the relationship extraction and analysis functions of GNNs for mining relationships in graph data. Specifically, we design a new prompt construction framework that integrates relational information of graph data into natural language expressions, aiding LLMs in more intuitively grasping the connectivity information within graph data. Additionally, we introduce graph relationship understanding and analysis functions into LLMs to enhance their focus on connectivity information in graph data. Our evaluation on real-world datasets demonstrates the framework's ability to understand connectivity information in graph data.
- [251] arXiv:2402.09654 [ pdf , ps , html , other ]
-
Title: GPT-4's assessment of its performance in a USMLE-based case studyUttam Dhakal , Aniket Kumar Singh , Suman Devkota , Yogesh Sapkota , Bishal Lamichhane , Suprinsa Paudyal , Chandra DhakalSubjects: Artificial Intelligence (cs.AI) ; Computation and Language (cs.CL); Human-Computer Interaction (cs.HC); Multiagent Systems (cs.MA); Machine Learning (stat.ML)
Abstract: This study investigates GPT-4's assessment of its performance in healthcare applications. A simple prompting technique was used to prompt the LLM with questions taken from the United States Medical Licensing Examination (USMLE) questionnaire and it was tasked to evaluate its confidence score before posing the question and after asking the question. The questionnaire was categorized into two groups-questions with feedback (WF) and questions with no feedback(NF) post-question. The model was asked to provide absolute and relative confidence scores before and after each question. The experimental findings were analyzed using statistical tools to study the variability of confidence in WF and NF groups. Additionally, a sequential analysis was conducted to observe the performance variation for the WF and NF groups. Results indicate that feedback influences relative confidence but doesn't consistently increase or decrease it. Understanding the performance of LLM is paramount in exploring its utility in sensitive areas like healthcare. This study contributes to the ongoing discourse on the reliability of AI, particularly of LLMs like GPT-4, within healthcare, offering insights into how feedback mechanisms might be optimized to enhance AI-assisted medical education and decision support.
- [252] arXiv:2402.09656 [ pdf , ps , other ]
-
Title: The Butterfly Effect of Model Editing: Few Edits Can Trigger Large Language Models CollapseSubjects: Artificial Intelligence (cs.AI)
Abstract: Although model editing has shown promise in revising knowledge in Large Language Models (LLMs), its impact on the inherent capabilities of LLMs is often overlooked. In this work, we reveal a critical phenomenon: even a single edit can trigger model collapse, manifesting as significant performance degradation in various benchmark tasks. However, benchmarking LLMs after each edit, while necessary to prevent such collapses, is impractically time-consuming and resource-intensive. To mitigate this, we propose using perplexity as a surrogate metric, validated by extensive experiments demonstrating its strong correlation with downstream tasks performance. We further conduct an in-depth study on sequential editing, a practical setting for real-world scenarios, across various editing methods and LLMs, focusing on hard cases from our previous single edit studies. The results indicate that nearly all examined editing methods result in model collapse after only few edits. To facilitate further research, we have utilized GPT-3.5 to develop a new dataset, HardEdit, based on those hard cases. This dataset aims to establish the foundation for pioneering research in reliable model editing and the mechanisms underlying editing-induced model collapse. We hope this work can draw the community's attention to the potential risks inherent in model editing practices.
- [253] arXiv:2402.09660 [ pdf , ps , html , other ]
-
Title: User Modeling and User Profiling: A Comprehensive SurveyErasmo Purificato (1), Ludovico Boratto (2), Ernesto William De Luca (1) ((1) Otto von Guericke University Magdeburg, Germany, (2) University of Cagliari, Italy)Comments: 71 pagesSubjects: Artificial Intelligence (cs.AI) ; Human-Computer Interaction (cs.HC); Information Retrieval (cs.IR); Machine Learning (cs.LG); Social and Information Networks (cs.SI)
Abstract: The integration of artificial intelligence (AI) into daily life, particularly through information retrieval and recommender systems, has necessitated advanced user modeling and profiling techniques to deliver personalized experiences. These techniques aim to construct accurate user representations based on the rich amounts of data generated through interactions with these systems. This paper presents a comprehensive survey of the current state, evolution, and future directions of user modeling and profiling research. We provide a historical overview, tracing the development from early stereotype models to the latest deep learning techniques, and propose a novel taxonomy that encompasses all active topics in this research area, including recent trends. Our survey highlights the paradigm shifts towards more sophisticated user profiling methods, emphasizing implicit data collection, multi-behavior modeling, and the integration of graph data structures. We also address the critical need for privacy-preserving techniques and the push towards explainability and fairness in user modeling approaches. By examining the definitions of core terminology, we aim to clarify ambiguities and foster a clearer understanding of the field by proposing two novel encyclopedic definitions of the main terms. Furthermore, we explore the application of user modeling in various domains, such as fake news detection, cybersecurity, and personalized education. This survey serves as a comprehensive resource for researchers and practitioners, offering insights into the evolution of user modeling and profiling and guiding the development of more personalized, ethical, and effective AI systems.
- [254] arXiv:2402.09729 [ pdf , ps , other ]
-
Title: Federated Prompt-based Decision Transformer for Customized VR Services in Mobile Edge Computing SystemSubjects: Artificial Intelligence (cs.AI) ; Systems and Control (eess.SY)
Abstract: This paper investigates resource allocation to provide heterogeneous users with customized virtual reality (VR) services in a mobile edge computing (MEC) system. We first introduce a quality of experience (QoE) metric to measure user experience, which considers the MEC system's latency, user attention levels, and preferred resolutions. Then, a QoE maximization problem is formulated for resource allocation to ensure the highest possible user experience,which is cast as a reinforcement learning problem, aiming to learn a generalized policy applicable across diverse user environments for all MEC servers. To learn the generalized policy, we propose a framework that employs federated learning (FL) and prompt-based sequence modeling to pre-train a common decision model across MEC servers, which is named FedPromptDT. Using FL solves the problem of insufficient local MEC data while protecting user privacy during offline training. The design of prompts integrating user-environment cues and user-preferred allocation improves the model's adaptability to various user environments during online execution.
- [255] arXiv:2402.09734 [ pdf , ps , other ]
-
Title: Agents Need Not Know Their PurposeSubjects: Artificial Intelligence (cs.AI)
Abstract: Ensuring artificial intelligence behaves in such a way that is aligned with human values is commonly referred to as the alignment challenge. Prior work has shown that rational agents, behaving in such a way that maximizes a utility function, will inevitably behave in such a way that is not aligned with human values, especially as their level of intelligence goes up. Prior work has also shown that there is no "one true utility function"; solutions must include a more holistic approach to alignment. This paper describes oblivious agents: agents that are architected in such a way that their effective utility function is an aggregation of a known and hidden sub-functions. The hidden component, to be maximized, is internally implemented as a black box, preventing the agent from examining it. The known component, to be minimized, is knowledge of the hidden sub-function. Architectural constraints further influence how agent actions can evolve its internal environment model. We show that an oblivious agent, behaving rationally, constructs an internal approximation of designers' intentions (i.e., infers alignment), and, as a consequence of its architecture and effective utility function, behaves in such a way that maximizes alignment; i.e., maximizing the approximated intention function. We show that, paradoxically, it does this for whatever utility function is used as the hidden component and, in contrast with extant techniques, chances of alignment actually improve as agent intelligence grows.
- [256] arXiv:2402.09764 [ pdf , ps , html , other ]
-
Title: Aligning Crowd Feedback via Distributional Preference Reward ModelingSubjects: Artificial Intelligence (cs.AI)
Abstract: Deep Reinforcement Learning is widely used for aligning Large Language Models (LLM) with human preference. However, the conventional reward modelling has predominantly depended on human annotations provided by a select cohort of individuals. Such dependence may unintentionally result in models that are skewed to reflect the inclinations of these annotators, thereby failing to represent the expectations of the wider population adequately. In this paper, we introduce the Distributional Preference Reward Model (DPRM), a simple yet effective framework to align large language models with a diverse set of human preferences. To this end, we characterize the preferences by a beta distribution, which can dynamically adapt to fluctuations in preference trends. On top of that, we design an optimal-transportation-based loss to calibrate DPRM to align with the preference distribution. Finally, the expected reward is utilized to fine-tune an LLM policy to generate responses favoured by the population. Our experiments show that DPRM significantly enhances the alignment of LLMs with population preference, yielding more accurate, unbiased, and contextually appropriate responses.
- [257] arXiv:2402.09765 [ pdf , ps , other ]
-
Title: Reinforcement Learning for Solving Stochastic Vehicle Routing Problem with Time WindowsSubjects: Artificial Intelligence (cs.AI)
Abstract: This paper introduces a reinforcement learning approach to optimize the Stochastic Vehicle Routing Problem with Time Windows (SVRP), focusing on reducing travel costs in goods delivery. We develop a novel SVRP formulation that accounts for uncertain travel costs and demands, alongside specific customer time windows. An attention-based neural network trained through reinforcement learning is employed to minimize routing costs. Our approach addresses a gap in SVRP research, which traditionally relies on heuristic methods, by leveraging machine learning. The model outperforms the Ant-Colony Optimization algorithm, achieving a 1.73% reduction in travel costs. It uniquely integrates external information, demonstrating robustness in diverse environments, making it a valuable benchmark for future SVRP studies and industry application.
- [258] arXiv:2402.09769 [ pdf , ps , other ]
-
Title: Representation Learning Using a Single Forward PassComments: Under reviewSubjects: Artificial Intelligence (cs.AI)
Abstract: We propose a neuroscience-inspired Solo Pass Embedded Learning Algorithm (SPELA). SPELA is a prime candidate for training and inference applications in Edge AI devices. At the same time, SPELA can optimally cater to the need for a framework to study perceptual representation learning and formation. SPELA has distinctive features such as neural priors (in the form of embedded vectors), no weight transport, no update locking of weights, complete local Hebbian learning, single forward pass with no storage of activations, and single weight update per sample. Juxtaposed with traditional approaches, SPELA operates without the need for backpropagation. We show that our algorithm can perform nonlinear classification on a noisy boolean operation dataset. Additionally, we exhibit high performance using SPELA across MNIST, KMNIST, and Fashion MNIST. Lastly, we show the few-shot and 1-epoch learning capabilities of SPELA on MNIST, KMNIST, and Fashion MNIST, where it consistently outperforms backpropagation.
- [259] arXiv:2402.09836 [ pdf , ps , other ]
-
Title: Beyond Imitation: Generating Human Mobility from Context-aware Reasoning with Large Language ModelsSubjects: Artificial Intelligence (cs.AI)
Abstract: Human mobility behaviours are closely linked to various important societal problems such as traffic congestion, and epidemic control. However, collecting mobility data can be prohibitively expensive and involves serious privacy issues, posing a pressing need for high-quality generative mobility models. Previous efforts focus on learning the behaviour distribution from training samples, and generate new mobility data by sampling the learned distributions. They cannot effectively capture the coherent intentions that drive mobility behavior, leading to low sample efficiency and semantic-awareness. Inspired by the emergent reasoning ability in LLMs, we propose a radical perspective shift that reformulates mobility generation as a commonsense reasoning problem. In this paper, we design a novel Mobility Generation as Reasoning (MobiGeaR) framework that prompts LLM to recursively generate mobility behaviour. Specifically, we design a context-aware chain-of-thoughts prompting technique to align LLMs with context-aware mobility behaviour by few-shot in-context learning. Besides, MobiGeaR employ a divide-and-coordinate mechanism to exploit the synergistic effect between LLM reasoning and mechanistic gravity model. It leverages the step-by-step LLM reasoning to recursively generate a temporal template of activity intentions, which are then mapped to physical locations with a mechanistic gravity model. Experiments on two real-world datasets show MobiGeaR achieves state-of-the-art performance across all metrics, and substantially reduces the size of training samples at the same time. Besides, MobiGeaR also significantly improves the semantic-awareness of mobility generation by improving the intention accuracy by 62.23% and the generated mobility data is proven effective in boosting the performance of downstream applications. The implementation of our approach is available in the paper.
- [260] arXiv:2402.09844 [ pdf , ps , html , other ]
-
Title: Jack of All Trades, Master of Some, a Multi-Purpose Transformer AgentComments: Under reviewSubjects: Artificial Intelligence (cs.AI)
Abstract: The search for a general model that can operate seamlessly across multiple domains remains a key goal in machine learning research. The prevailing methodology in Reinforcement Learning (RL) typically limits models to a single task within a unimodal framework, a limitation that contrasts with the broader vision of a versatile, multi-domain model. In this paper, we present Jack of All Trades (JAT), a transformer-based model with a unique design optimized for handling sequential decision-making tasks and multimodal data types. The JAT model demonstrates its robust capabilities and versatility by achieving strong performance on very different RL benchmarks, along with promising results on Computer Vision (CV) and Natural Language Processing (NLP) tasks, all using a single set of weights. The JAT model marks a significant step towards more general, cross-domain AI model design, and notably, it is the first model of its kind to be fully open-sourced (see this https URL ), including a pioneering general-purpose dataset.
- [261] arXiv:2402.09877 [ pdf , ps , html , other ]
-
Title: On Computing Plans with Uniform Action CostsSubjects: Artificial Intelligence (cs.AI)
Abstract: In many real-world planning applications, agents might be interested in finding plans whose actions have costs that are as uniform as possible. Such plans provide agents with a sense of stability and predictability, which are key features when humans are the agents executing plans suggested by planning tools. This paper adapts three uniformity metrics to automated planning, and introduce planning-based compilations that allow to lexicographically optimize sum of action costs and action costs uniformity. Experimental results both in well-known and novel planning benchmarks show that the reformulated tasks can be effectively solved in practice to generate uniform plans.
- [262] arXiv:2402.09880 [ pdf , ps , other ]
-
Title: Inadequacies of Large Language Model Benchmarks in the Era of Generative Artificial IntelligenceSubjects: Artificial Intelligence (cs.AI) ; Computation and Language (cs.CL); Computers and Society (cs.CY); Human-Computer Interaction (cs.HC)
Abstract: The rapid rise in popularity of Large Language Models (LLMs) with emerging capabilities has spurred public curiosity to evaluate and compare different LLMs, leading many researchers to propose their LLM benchmarks. Noticing preliminary inadequacies in those benchmarks, we embarked on a study to critically assess 23 state-of-the-art LLM benchmarks, using our novel unified evaluation framework through the lenses of people, process, and technology, under the pillars of functionality and security. Our research uncovered significant limitations, including biases, difficulties in measuring genuine reasoning, adaptability, implementation inconsistencies, prompt engineering complexity, evaluator diversity, and the overlooking of cultural and ideological norms in one comprehensive assessment. Our discussions emphasized the urgent need for standardized methodologies, regulatory certainties, and ethical guidelines in light of Artificial Intelligence (AI) advancements, including advocating for an evolution from static benchmarks to dynamic behavioral profiling to accurately capture LLMs' complex behaviors and potential risks. Our study highlighted the necessity for a paradigm shift in LLM evaluation methodologies, underlining the importance of collaborative efforts for the development of universally accepted benchmarks and the enhancement of AI systems' integration into society.
- [263] arXiv:2402.09919 [ pdf , ps , html , other ]
-
Title: Road Graph Generator: Mapping roads at construction sites from GPS dataComments: 18 pages, 4 figures, 3 tablesSubjects: Artificial Intelligence (cs.AI)
Abstract: We propose a new method for inferring roads from GPS trajectories to map construction sites. This task presents a unique challenge due to the erratic and non-standard movement patterns of construction machinery, which significantly diverge from typical vehicular traffic on established roads. Our proposed method first identifies intersections in the road network that serve as critical decision points, and then connects them with edges to produce a graph, which can subsequently be used for planning and task-allocation. We demonstrate the approach by mapping roads at a real-life construction site in Norway. The method is validated on four increasingly complex segments of the map. In our tests, the method achieved perfect accuracy in detecting intersections and inferring roads in data with no or low noise, while its performance was reduced in map areas with significant noise and consistently missing GPS updates.
- [264] arXiv:2402.09939 [ pdf , ps , other ]
-
Title: Generative AI in the Construction Industry: A State-of-the-art AnalysisRidwan Taiwo , Idris Temitope Bello , Sulemana Fatoama Abdulai , Abdul-Mugis Yussif , Babatunde Abiodun Salami , Abdullahi Saka , Tarek ZayedComments: 74 pages, 11 figures, 20 tablesSubjects: Artificial Intelligence (cs.AI) ; Computation and Language (cs.CL); Human-Computer Interaction (cs.HC); Information Retrieval (cs.IR); Machine Learning (cs.LG)
Abstract: The construction industry is a vital sector of the global economy, but it faces many productivity challenges in various processes, such as design, planning, procurement, inspection, and maintenance. Generative artificial intelligence (AI), which can create novel and realistic data or content, such as text, image, video, or code, based on some input or prior knowledge, offers innovative and disruptive solutions to address these challenges. However, there is a gap in the literature on the current state, opportunities, and challenges of generative AI in the construction industry. This study aims to fill this gap by providing a state-of-the-art analysis of generative AI in construction, with three objectives: (1) to review and categorize the existing and emerging generative AI opportunities and challenges in the construction industry; (2) to propose a framework for construction firms to build customized generative AI solutions using their own data, comprising steps such as data collection, dataset curation, training custom large language model (LLM), model evaluation, and deployment; and (3) to demonstrate the framework via a case study of developing a generative model for querying contract documents. The results show that retrieval augmented generation (RAG) improves the baseline LLM by 5.2, 9.4, and 4.8% in terms of quality, relevance, and reproducibility. This study provides academics and construction professionals with a comprehensive analysis and practical framework to guide the adoption of generative AI techniques to enhance productivity, quality, safety, and sustainability across the construction industry.
- [265] arXiv:2402.09997 [ pdf , ps , other ]
-
Title: LoraRetriever: Input-Aware LoRA Retrieval and Composition for Mixed Tasks in the WildSubjects: Artificial Intelligence (cs.AI) ; Computation and Language (cs.CL); Machine Learning (cs.LG)
Abstract: Low-Rank Adaptation (LoRA) provides an effective yet efficient solution for fine-tuning large language models (LLM). The modular and plug-and-play nature of LoRA enables the integration of diverse domain-specific LoRAs to enhance the capabilities of LLMs. Previous research on exploiting multiple LoRAs either focuses on specific isolated downstream tasks or fixes the selection of LoRAs during training. However, in real-world scenarios, LLMs receive diverse prompts covering different tasks, and the pool of candidate LoRAs is often dynamically updated. To bridge this gap, we propose LoraRetriever, a retrieve-then-compose framework that adaptively retrieves and composes multiple LoRAs according to the input prompts. LoraRetriever contains three main components: firstly, identifying and retrieving LoRAs relevant to the given input; secondly, formulating strategies for effectively integrating the retrieved LoRAs; and thirdly, developing efficient batch inference to accommodate heterogeneous requests. Experimental results indicate that LoraRetriever consistently outperforms the baselines, highlighting its practical effectiveness and versatility.
- [266] arXiv:2402.10011 [ pdf , ps , html , other ]
-
Title: Clifford Group Equivariant Simplicial Message Passing NetworksSubjects: Artificial Intelligence (cs.AI)
Abstract: We introduce Clifford Group Equivariant Simplicial Message Passing Networks, a method for steerable E(n)-equivariant message passing on simplicial complexes. Our method integrates the expressivity of Clifford group-equivariant layers with simplicial message passing, which is topologically more intricate than regular graph message passing. Clifford algebras include higher-order objects such as bivectors and trivectors, which express geometric features (e.g., areas, volumes) derived from vectors. Using this knowledge, we represent simplex features through geometric products of their vertices. To achieve efficient simplicial message passing, we share the parameters of the message network across different dimensions. Additionally, we restrict the final message to an aggregation of the incoming messages from different dimensions, leading to what we term shared simplicial message passing. Experimental results show that our method is able to outperform both equivariant and simplicial graph neural networks on a variety of geometric tasks.
- [267] arXiv:2402.10051 [ pdf , ps , html , other ]
-
Title: SwissNYF: Tool Grounded LLM Agents for Black Box SettingSubjects: Artificial Intelligence (cs.AI) ; Computation and Language (cs.CL)
Abstract: While Large Language Models (LLMs) have demonstrated enhanced capabilities in function-calling, these advancements primarily rely on accessing the functions' responses. This methodology is practical for simpler APIs but faces scalability issues with irreversible APIs that significantly impact the system, such as a database deletion API. Similarly, processes requiring extensive time for each API call and those necessitating forward planning, like automated action pipelines, present complex challenges. Furthermore, scenarios often arise where a generalized approach is needed because algorithms lack direct access to the specific implementations of these functions or secrets to use them. Traditional tool planning methods are inadequate in these cases, compelling the need to operate within black-box environments. Unlike their performance in tool manipulation, LLMs excel in black-box tasks, such as program synthesis. Therefore, we harness the program synthesis capabilities of LLMs to strategize tool usage in black-box settings, ensuring solutions are verified prior to implementation. We introduce TOPGUN, an ingeniously crafted approach leveraging program synthesis for black box tool planning. Accompanied by SwissNYF, a comprehensive suite that integrates black-box algorithms for planning and verification tasks, addressing the aforementioned challenges and enhancing the versatility and effectiveness of LLMs in complex API interactions. The public code for SwissNYF is available at this https URL .
- [268] arXiv:2402.10083 [ pdf , ps , other ]
-
Title: Fine-tuning Large Language Model (LLM) Artificial Intelligence Chatbots in Ophthalmology and LLM-based evaluation using GPT-4Ting Fang Tan , Kabilan Elangovan , Liyuan Jin , Yao Jie , Li Yong , Joshua Lim , Stanley Poh , Wei Yan Ng , Daniel Lim , Yuhe Ke , Nan Liu , Daniel Shu Wei TingComments: 13 Pages, 1 Figure, 8 TablesSubjects: Artificial Intelligence (cs.AI)
Abstract: Purpose: To assess the alignment of GPT-4-based evaluation to human clinician experts, for the evaluation of responses to ophthalmology-related patient queries generated by fine-tuned LLM chatbots. Methods: 400 ophthalmology questions and paired answers were created by ophthalmologists to represent commonly asked patient questions, divided into fine-tuning (368; 92%), and testing (40; 8%). We find-tuned 5 different LLMs, including LLAMA2-7b, LLAMA2-7b-Chat, LLAMA2-13b, and LLAMA2-13b-Chat. For the testing dataset, additional 8 glaucoma QnA pairs were included. 200 responses to the testing dataset were generated by 5 fine-tuned LLMs for evaluation. A customized clinical evaluation rubric was used to guide GPT-4 evaluation, grounded on clinical accuracy, relevance, patient safety, and ease of understanding. GPT-4 evaluation was then compared against ranking by 5 clinicians for clinical alignment. Results: Among all fine-tuned LLMs, GPT-3.5 scored the highest (87.1%), followed by LLAMA2-13b (80.9%), LLAMA2-13b-chat (75.5%), LLAMA2-7b-Chat (70%) and LLAMA2-7b (68.8%) based on the GPT-4 evaluation. GPT-4 evaluation demonstrated significant agreement with human clinician rankings, with Spearman and Kendall Tau correlation coefficients of 0.90 and 0.80 respectively; while correlation based on Cohen Kappa was more modest at 0.50. Notably, qualitative analysis and the glaucoma sub-analysis revealed clinical inaccuracies in the LLM-generated responses, which were appropriately identified by the GPT-4 evaluation. Conclusion: The notable clinical alignment of GPT-4 evaluation highlighted its potential to streamline the clinical evaluation of LLM chatbot responses to healthcare-related queries. By complementing the existing clinician-dependent manual grading, this efficient and automated evaluation could assist the validation of future developments in LLM applications for healthcare.
- [269] arXiv:2402.10102 [ pdf , ps , other ]
-
Title: A privacy-preserving, distributed and cooperative FCM-based learning approach for Cancer ResearchComments: Rough Sets: International Joint Conference, IJCRS 2020Subjects: Artificial Intelligence (cs.AI) ; Distributed, Parallel, and Cluster Computing (cs.DC)
Abstract: Distributed Artificial Intelligence is attracting interest day by day. In this paper, the authors introduce an innovative methodology for distributed learning of Particle Swarm Optimization-based Fuzzy Cognitive Maps in a privacy-preserving way. The authors design a training scheme for collaborative FCM learning that offers data privacy compliant with the current regulation. This method is applied to a cancer detection problem, proving that the performance of the model is improved by the Federated Learning process, and obtaining similar results to the ones that can be found in the literature.
- [270] arXiv:2402.10104 [ pdf , ps , other ]
-
Title: GeoEval: Benchmark for Evaluating LLMs and Multi-Modal Models on Geometry Problem-SolvingSubjects: Artificial Intelligence (cs.AI) ; Computation and Language (cs.CL)
Abstract: Recent advancements in Large Language Models (LLMs) and Multi-Modal Models (MMs) have demonstrated their remarkable capabilities in problem-solving. Yet, their proficiency in tackling geometry math problems, which necessitates an integrated understanding of both textual and visual information, has not been thoroughly evaluated. To address this gap, we introduce the GeoEval benchmark, a comprehensive collection that includes a main subset of 2000 problems, a 750 problem subset focusing on backward reasoning, an augmented subset of 2000 problems, and a hard subset of 300 problems. This benchmark facilitates a deeper investigation into the performance of LLMs and MMs on solving geometry math problems. Our evaluation of ten LLMs and MMs across these varied subsets reveals that the WizardMath model excels, achieving a 55.67\% accuracy rate on the main subset but only a 6.00\% accuracy on the challenging subset. This highlights the critical need for testing models against datasets on which they have not been pre-trained. Additionally, our findings indicate that GPT-series models perform more effectively on problems they have rephrased, suggesting a promising method for enhancing model capabilities.
- [271] arXiv:2402.10109 [ pdf , ps , html , other ]
-
Title: Towards Reducing Diagnostic Errors with Interpretable Risk PredictionDenis Jered McInerney , William Dickinson , Lucy C. Flynn , Andrea C. Young , Geoffrey S. Young , Jan-Willem van de Meent , Byron C. WallaceSubjects: Artificial Intelligence (cs.AI) ; Computation and Language (cs.CL); Machine Learning (cs.LG)
Abstract: Many diagnostic errors occur because clinicians cannot easily access relevant information in patient Electronic Health Records (EHRs). In this work we propose a method to use LLMs to identify pieces of evidence in patient EHR data that indicate increased or decreased risk of specific diagnoses; our ultimate aim is to increase access to evidence and reduce diagnostic errors. In particular, we propose a Neural Additive Model to make predictions backed by evidence with individualized risk estimates at time-points where clinicians are still uncertain, aiming to specifically mitigate delays in diagnosis and errors stemming from an incomplete differential. To train such a model, it is necessary to infer temporally fine-grained retrospective labels of eventual "true" diagnoses. We do so with LLMs, to ensure that the input text is from before a confident diagnosis can be made. We use an LLM to retrieve an initial pool of evidence, but then refine this set of evidence according to correlations learned by the model. We conduct an in-depth evaluation of the usefulness of our approach by simulating how it might be used by a clinician to decide between a pre-defined list of differential diagnoses.
- [272] arXiv:2402.10115 [ pdf , ps , other ]
-
Title: Generating Visual Stimuli from EEG Recordings using Transformer-encoder based EEG encoder and GANSubjects: Artificial Intelligence (cs.AI) ; Machine Learning (cs.LG); Signal Processing (eess.SP); Neurons and Cognition (q-bio.NC)
Abstract: In this study, we tackle a modern research challenge within the field of perceptual brain decoding, which revolves around synthesizing images from EEG signals using an adversarial deep learning framework. The specific objective is to recreate images belonging to various object categories by leveraging EEG recordings obtained while subjects view those images. To achieve this, we employ a Transformer-encoder based EEG encoder to produce EEG encodings, which serve as inputs to the generator component of the GAN network. Alongside the adversarial loss, we also incorporate perceptual loss to enhance the quality of the generated images.
- [273] arXiv:2402.10133 [ pdf , ps , other ]
-
Title: Zero-Shot Reasoning: Personalized Content Generation Without the Cold Start ProblemDavor Hafnar (1), Jure Demšar (1 and 2) ((1) Faculty of Computer and Information Science, University of Ljubljana (2) Department of Psychology, Faculty of Arts, University of Ljubljana)Comments: 9 pages, 6 figuresSubjects: Artificial Intelligence (cs.AI)
Abstract: Procedural content generation uses algorithmic techniques to create large amounts of new content for games at much lower production costs. In newer approaches, procedural content generation utilizes machine learning. However, these methods usually require expensive collection of large amounts of data, as well as the development and training of fairly complex learning models, which can be both extremely time-consuming and expensive. The core of our research is to explore whether we can lower the barrier to the use of personalized procedural content generation through a more practical and generalizable approach with large language models. Matching game content with player preferences benefits both players, who enjoy the game more, and developers, who increasingly depend on players enjoying the game before being able to monetize it. Therefore, this paper presents a novel approach to achieving personalization by using large language models to propose levels based on the gameplay data continuously collected from individual players. We compared the levels generated using our approach with levels generated with more traditional procedural generation techniques. Our easily reproducible method has proven viable in a production setting and outperformed levels generated by traditional methods in the probability that a player will not quit the game mid-level.
- [274] arXiv:2402.10172 [ pdf , ps , other ]
-
Title: OptiMUS: Scalable Optimization Modeling with (MI)LP Solvers and Large Language ModelsSubjects: Artificial Intelligence (cs.AI) ; Multiagent Systems (cs.MA)
Abstract: Optimization problems are pervasive in sectors from manufacturing and distribution to healthcare. However, most such problems are still solved heuristically by hand rather than optimally by state-of-the-art solvers because the expertise required to formulate and solve these problems limits the widespread adoption of optimization tools and techniques. This paper introduces OptiMUS, a Large Language Model (LLM)-based agent designed to formulate and solve (mixed integer) linear programming problems from their natural language descriptions. OptiMUS can develop mathematical models, write and debug solver code, evaluate the generated solutions, and improve its model and code based on these evaluations. OptiMUS utilizes a modular structure to process problems, allowing it to handle problems with long descriptions and complex data without long prompts. Experiments demonstrate that OptiMUS outperforms existing state-of-the-art methods on easy datasets by more than $20\%$ and on hard datasets (including a new dataset, NLP4LP, released with this paper that features long and complex problems) by more than $30\%$.
- [275] arXiv:2402.10290 [ pdf , ps , html , other ]
-
Title: Experiments with Encoding Structured Data for Neural NetworksComments: 18 pages, 8 figures, 2 tablesSubjects: Artificial Intelligence (cs.AI)
Abstract: The project's aim is to create an AI agent capable of selecting good actions in a game-playing domain called Battlespace. Sequential domains like Battlespace are important testbeds for planning problems, as such, the Department of Defense uses such domains for wargaming exercises. The agents we developed combine Monte Carlo Tree Search (MCTS) and Deep Q-Network (DQN) techniques in an effort to navigate the game environment, avoid obstacles, interact with adversaries, and capture the flag. This paper will focus on the encoding techniques we explored to present complex structured data stored in a Python class, a necessary precursor to an agent.
- [276] arXiv:2402.10416 [ pdf , ps , html , other ]
-
Title: Grounding Language about Belief in a Bayesian Theory-of-MindComments: Under Review, 7 pagesSubjects: Artificial Intelligence (cs.AI) ; Computation and Language (cs.CL)
Abstract: Despite the fact that beliefs are mental states that cannot be directly observed, humans talk about each others' beliefs on a regular basis, often using rich compositional language to describe what others think and know. What explains this capacity to interpret the hidden epistemic content of other minds? In this paper, we take a step towards an answer by grounding the semantics of belief statements in a Bayesian theory-of-mind: By modeling how humans jointly infer coherent sets of goals, beliefs, and plans that explain an agent's actions, then evaluating statements about the agent's beliefs against these inferences via epistemic logic, our framework provides a conceptual role semantics for belief, explaining the gradedness and compositionality of human belief attributions, as well as their intimate connection with goals and plans. We evaluate this framework by studying how humans attribute goals and beliefs while watching an agent solve a doors-and-keys gridworld puzzle that requires instrumental reasoning about hidden objects. In contrast to pure logical deduction, non-mentalizing baselines, and mentalizing that ignores the role of instrumental plans, our model provides a much better fit to human goal and belief attributions, demonstrating the importance of theory-of-mind for a semantics of belief.
- [277] arXiv:2402.10705 [ pdf , ps , other ]
-
Title: AutoSAT: Automatically Optimize SAT Solvers via Large Language ModelsSubjects: Artificial Intelligence (cs.AI)
Abstract: Heuristics are crucial in SAT solvers, while no heuristic rules are suitable for all problem instances. Therefore, it typically requires to refine specific solvers for specific problem instances. In this context, we present AutoSAT, a novel framework for automatically optimizing heuristics in SAT solvers. AutoSAT is based on Large Large Models (LLMs) which is able to autonomously generate code, conduct evaluation, then utilize the feedback to further optimize heuristics, thereby reducing human intervention and enhancing solver capabilities. AutoSAT operates on a plug-and-play basis, eliminating the need for extensive preliminary setup and model training, and fosters a Chain of Thought collaborative process with fault-tolerance, ensuring robust heuristic optimization. Extensive experiments on a Conflict-Driven Clause Learning (CDCL) solver demonstrates the overall superior performance of AutoSAT, especially in solving some specific SAT problem instances.
- [278] arXiv:2402.10725 [ pdf , ps , html , other ]
-
Title: Cloud Kitchen: Using Planning-based Composite AI to Optimize Food Delivery ProcessSubjects: Artificial Intelligence (cs.AI) ; Logic in Computer Science (cs.LO)
Abstract: The global food delivery market provides many opportunities for AI-based services that can improve the efficiency of feeding the world. This paper presents the Cloud Kitchen platform as a decision-making tool for restaurants with food delivery and a simulator to evaluate the impact of the decisions. The platform consists of a Technology-Specific Bridge (TSB) that provides an interface for communicating with restaurants or the simulator. TSB uses a PDDL model to represent decisions embedded in the Unified Planning Framework (UPF). Decision-making, which concerns allocating customers' orders to vehicles and deciding in which order the customers will be served (for each vehicle), is done via a Vehicle Routing Problem with Time Windows (VRPTW), an efficient tool for this problem. We show that decisions made by our platform can improve customer satisfaction by reducing the number of delayed deliveries using a real-world historical dataset.
- [279] arXiv:2402.10726 [ pdf , ps , other ]
-
Title: Learning Planning Action Models from State TracesTomáš Balyo , Martin Suda , Lukáš Chrpa , Dominik Šafránek , Filip Dvořák , Roman Barták , G. Michael YoungbloodSubjects: Artificial Intelligence (cs.AI)
Abstract: Previous STRIPS domain model acquisition approaches that learn from state traces start with the names and parameters of the actions to be learned. Therefore their only task is to deduce the preconditions and effects of the given actions. In this work, we explore learning in situations when the parameters of learned actions are not provided. We define two levels of trace quality based on which information is provided and present an algorithm for each. In one level (L1), the states in the traces are labeled with action names, so we can deduce the number and names of the actions, but we still need to work out the number and types of parameters. In the other level (L2), the states are additionally labeled with objects that constitute the parameters of the corresponding grounded actions. Here we still need to deduce the types of the parameters in the learned actions. We experimentally evaluate the proposed algorithms and compare them with the state-of-the-art learning tool FAMA on a large collection of IPC benchmarks. The evaluation shows that our new algorithms are faster, can handle larger inputs and provide better results in terms of learning action models more similar to reference models.
- [280] arXiv:2402.10762 [ pdf , ps , other ]
-
Title: On Explaining Unfairness: An OverviewSubjects: Artificial Intelligence (cs.AI)
Abstract: Algorithmic fairness and explainability are foundational elements for achieving responsible AI. In this paper, we focus on their interplay, a research area that is recently receiving increasing attention. To this end, we first present two comprehensive taxonomies, each representing one of the two complementary fields of study: fairness and explanations. Then, we categorize explanations for fairness into three types: (a) Explanations to enhance fairness metrics, (b) Explanations to help us understand the causes of (un)fairness, and (c) Explanations to assist us in designing methods for mitigating unfairness. Finally, based on our fairness and explanation taxonomies, we present undiscovered literature paths revealing gaps that can serve as valuable insights for future research.
- [281] arXiv:2402.10877 [ pdf , ps , html , other ]
-
Title: Robust agents learn causal world modelsComments: ICLR 2024 (oral). Proofs in appendix simplified. Typos correctedSubjects: Artificial Intelligence (cs.AI) ; Machine Learning (cs.LG)
Abstract: It has long been hypothesised that causal reasoning plays a fundamental role in robust and general intelligence. However, it is not known if agents must learn causal models in order to generalise to new domains, or if other inductive biases are sufficient. We answer this question, showing that any agent capable of satisfying a regret bound under a large set of distributional shifts must have learned an approximate causal model of the data generating process, which converges to the true causal model for optimal agents. We discuss the implications of this result for several research areas including transfer learning and causal inference.
- [282] arXiv:2402.10888 [ pdf , ps , other ]
-
Title: Explainability for Machine Learning Models: From Data Adaptability to User PerceptionComments: PhD ThesisSubjects: Artificial Intelligence (cs.AI) ; Human-Computer Interaction (cs.HC); Machine Learning (cs.LG)
Abstract: This thesis explores the generation of local explanations for already deployed machine learning models, aiming to identify optimal conditions for producing meaningful explanations considering both data and user requirements. The primary goal is to develop methods for generating explanations for any model while ensuring that these explanations remain faithful to the underlying model and comprehensible to the users.
The thesis is divided into two parts. The first enhances a widely used rule-based explanation method. It then introduces a novel approach for evaluating the suitability of linear explanations to approximate a model. Additionally, it conducts a comparative experiment between two families of counterfactual explanation methods to analyze the advantages of one over the other. The second part focuses on user experiments to assess the impact of three explanation methods and two distinct representations. These experiments measure how users perceive their interaction with the model in terms of understanding and trust, depending on the explanations and representations. This research contributes to a better explanation generation, with potential implications for enhancing the transparency, trustworthiness, and usability of deployed AI systems. - [283] arXiv:2402.10921 [ pdf , ps , html , other ]
-
Title: AM^2-EmoJE: Adaptive Missing-Modality Emotion Recognition in Conversation via Joint Embedding LearningSubjects: Artificial Intelligence (cs.AI) ; Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Abstract: Human emotion can be presented in different modes i.e., audio, video, and text. However, the contribution of each mode in exhibiting each emotion is not uniform. Furthermore, the availability of complete mode-specific details may not always be guaranteed in the test time. In this work, we propose AM^2-EmoJE, a model for Adaptive Missing-Modality Emotion Recognition in Conversation via Joint Embedding Learning model that is grounded on two-fold contributions: First, a query adaptive fusion that can automatically learn the relative importance of its mode-specific representations in a query-specific manner. By this the model aims to prioritize the mode-invariant spatial query details of the emotion patterns, while also retaining its mode-exclusive aspects within the learned multimodal query descriptor. Second the multimodal joint embedding learning module that explicitly addresses various missing modality scenarios in test-time. By this, the model learns to emphasize on the correlated patterns across modalities, which may help align the cross-attended mode-specific descriptors pairwise within a joint-embedding space and thereby compensate for missing modalities during inference. By leveraging the spatio-temporal details at the dialogue level, the proposed AM^2-EmoJE not only demonstrates superior performance compared to the best-performing state-of-the-art multimodal methods, by effectively leveraging body language in place of face expression, it also exhibits an enhanced privacy feature. By reporting around 2-5% improvement in the weighted-F1 score, the proposed multimodal joint embedding module facilitates an impressive performance gain in a variety of missing-modality query scenarios during test time.
- [284] arXiv:2402.10967 [ pdf , ps , other ]
-
Title: Social network analysis for personalized characterization and risk assessment of alcohol use disorders in adolescents using semantic technologiesJosé Alberto Benítez-Andrades , Isaías García-Rodríguez , Carmen Benavides , Héctor Alaiz-Moretón , Alejandro Rodríguez-GonzálezJournal-ref: Future Generation Computer Systems, Volume 106, May 2020, Pages 154-170Subjects: Artificial Intelligence (cs.AI)
Abstract: Alcohol Use Disorder (AUD) is a major concern for public health organizations worldwide, especially as regards the adolescent population. The consumption of alcohol in adolescents is known to be influenced by seeing friends and even parents drinking alcohol. Building on this fact, a number of studies into alcohol consumption among adolescents have made use of Social Network Analysis (SNA) techniques to study the different social networks (peers, friends, family, etc.) with whom the adolescent is involved. These kinds of studies need an initial phase of data gathering by means of questionnaires and a subsequent analysis phase using the SNA techniques. The process involves a number of manual data handling stages that are time consuming and error-prone. The use of knowledge engineering techniques (including the construction of a domain ontology) to represent the information, allows the automation of all the activities, from the initial data collection to the results of the SNA study. This paper shows how a knowledge model is constructed, and compares the results obtained using the traditional method with this, fully automated model, detailing the main advantages of the latter. In the case of the SNA analysis, the validity of the results obtained with the knowledge engineering approach are compared to those obtained manually using the UCINET, Cytoscape, Pajek and Gephi to test the accuracy of the knowledge model.
- [285] arXiv:2402.11359 [ pdf , ps , other ]
-
Title: Offline Training of Language Model Agents with Functions as Learnable WeightsComments: 22 pages, 10 figuresSubjects: Artificial Intelligence (cs.AI) ; Computation and Language (cs.CL)
Abstract: Researchers and practitioners have recently reframed powerful Large Language Models (LLMs) as agents, enabling them to automate complex tasks largely via the use of specialized functions. To facilitate the development of LLM agents, we present a novel paradigm of training LLM agents without modifying the LLM weights, which is particularly useful when the LLMs are difficult or inaccessible for modifications. Inspired by how humans continuously forge tools to adapt to real-world tasks, rather than change our biological structure to fit a static set of tools, we propose to progressively forge agent's functions to better solve the downstream tasks instead of modifying the LLM weights. By treating the functions as learnable `agent parameters' and leveraging the fundamental idea of model training in artificial intelligence, we develop AgentOptimizer that employs the LLM to update agents' functions and devise an agent training algorithm with two strategies, roll-back, and early-stop, to streamline the training process. With extensive experiments, we showcase that the agent training paradigm could significantly improve the performance of representative LLM agents in various downstream tasks. We also study the behavior of the agent training regarding aspects like the learning curve and domain transferability.
- [286] arXiv:2402.11403 [ pdf , ps , html , other ]
-
Title: An Empirical Evaluation of Neural and Neuro-symbolic Approaches to Real-time Multimodal Complex Event DetectionSubjects: Artificial Intelligence (cs.AI)
Abstract: Robots and autonomous systems require an understanding of complex events (CEs) from sensor data to interact with their environments and humans effectively. Traditional end-to-end neural architectures, despite processing sensor data efficiently, struggle with long-duration events due to limited context sizes and reasoning capabilities. Recent advances in neuro-symbolic methods, which integrate neural and symbolic models leveraging human knowledge, promise improved performance with less data. This study addresses the gap in understanding these approaches' effectiveness in complex event detection (CED), especially in temporal reasoning. We investigate neural and neuro-symbolic architectures' performance in a multimodal CED task, analyzing IMU and acoustic data streams to recognize CE patterns. Our methodology includes (i) end-to-end neural architectures for direct CE detection from sensor embeddings, (ii) two-stage concept-based neural models mapping sensor embeddings to atomic events (AEs) before CE detection, and (iii) a neuro-symbolic approach using a symbolic finite-state machine for CE detection from AEs. Empirically, the neuro-symbolic architecture significantly surpasses purely neural models, demonstrating superior performance in CE recognition, even with extensive training data and ample temporal context for neural approaches.
- [287] arXiv:2402.11461 [ pdf , ps , html , other ]
-
Title: FGeo-HyperGNet: Geometric Problem Solving Integrating Formal Symbolic System and Hypergraph Neural NetworkComments: 13 pagesSubjects: Artificial Intelligence (cs.AI)
Abstract: Geometric problem solving has always been a long-standing challenge in the fields of automated reasoning and artificial intelligence. We built a neural-symbolic system to automatically perform human-like geometric deductive reasoning. The symbolic part is a formal system built on FormalGeo, which can automatically perform geomertic relational reasoning and algebraic calculations and organize the solving process into a solution hypertree with conditions as hypernodes and theorems as hyperedges. The neural part, called HyperGNet, is a hypergraph neural network based on the attention mechanism, including a encoder to effectively encode the structural and semantic information of the hypertree, and a solver to provide problem-solving guidance. The neural part predicts theorems according to the hypertree, and the symbolic part applies theorems and updates the hypertree, thus forming a predict-apply cycle to ultimately achieve readable and traceable automatic solving of geometric problems. Experiments demonstrate the correctness and effectiveness of this neural-symbolic architecture. We achieved a step-wised accuracy of 87.65% and an overall accuracy of 85.53% on the formalgeo7k datasets.
- [288] arXiv:2402.11653 [ pdf , ps , html , other ]
-
Title: Combinatorial Client-Master Multiagent Deep Reinforcement Learning for Task Offloading in Mobile Edge ComputingComments: 11 pages, 5 figures, 2 tablesSubjects: Artificial Intelligence (cs.AI) ; Distributed, Parallel, and Cluster Computing (cs.DC); Networking and Internet Architecture (cs.NI)
Abstract: Recently, there has been an explosion of mobile applications that perform computationally intensive tasks such as video streaming, data mining, virtual reality, augmented reality, image processing, video processing, face recognition, and online gaming. However, user devices (UDs), such as tablets and smartphones, have a limited ability to perform the computation needs of the tasks. Mobile edge computing (MEC) has emerged as a promising technology to meet the increasing computing demands of UDs. Task offloading in MEC is a strategy that meets the demands of UDs by distributing tasks between UDs and MEC servers. Deep reinforcement learning (DRL) is gaining attention in task-offloading problems because it can adapt to dynamic changes and minimize online computational complexity. However, the various types of continuous and discrete resource constraints on UDs and MEC servers pose challenges to the design of an efficient DRL-based task-offloading strategy. Existing DRL-based task-offloading algorithms focus on the constraints of the UDs, assuming the availability of enough storage resources on the server. Moreover, existing multiagent DRL (MADRL)--based task-offloading algorithms are homogeneous agents and consider homogeneous constraints as a penalty in their reward function. We proposed a novel combinatorial client-master MADRL (CCM\_MADRL) algorithm for task offloading in MEC (CCM\_MADRL\_MEC) that enables UDs to decide their resource requirements and the server to make a combinatorial decision based on the requirements of the UDs. CCM\_MADRL\_MEC is the first MADRL in task offloading to consider server storage capacity in addition to the constraints in the UDs. By taking advantage of the combinatorial action selection, CCM\_MADRL\_MEC has shown superior convergence over existing MADDPG and heuristic algorithms.
- [289] arXiv:2402.11658 [ pdf , ps , other ]
-
Title: Dynamic planning in hierarchical active inferenceSubjects: Artificial Intelligence (cs.AI) ; Machine Learning (cs.LG); Robotics (cs.RO)
Abstract: By dynamic planning, we refer to the ability of the human brain to infer and impose motor trajectories related to cognitive decisions. A recent paradigm, active inference, brings fundamental insights into the adaptation of biological organisms, constantly striving to minimize prediction errors to restrict themselves to life-compatible states. Over the past years, many studies have shown how human and animal behavior could be explained in terms of an active inferential process -- either as discrete decision-making or continuous motor control -- inspiring innovative solutions in robotics and artificial intelligence. Still, the literature lacks a comprehensive outlook on how to effectively plan actions in changing environments. Setting ourselves the goal of modeling tool use, we delve into the topic of dynamic planning in active inference, keeping in mind two crucial aspects of biological goal-directed behavior: the capacity to understand and exploit affordances for object manipulation, and to learn the hierarchical interactions between the self and the environment, including other agents. We start from a simple unit and gradually describe more advanced structures, comparing recently proposed design choices and providing basic examples for each section. This study distances itself from traditional views centered on neural networks and reinforcement learning, and points toward a yet unexplored direction in active inference: hybrid representations in hierarchical models.
- [290] arXiv:2402.11804 [ pdf , ps , other ]
-
Title: LLM as Prompter: Low-resource Inductive Reasoning on Arbitrary Knowledge GraphsComments: 16 pages, 6 figuresSubjects: Artificial Intelligence (cs.AI) ; Computation and Language (cs.CL); Social and Information Networks (cs.SI)
Abstract: Knowledge Graph (KG) inductive reasoning, which aims to infer missing facts from new KGs that are not seen during training, has been widely adopted in various applications. One critical challenge of KG inductive reasoning is handling low-resource scenarios with scarcity in both textual and structural aspects. In this paper, we attempt to address this challenge with Large Language Models (LLMs). Particularly, we utilize the state-of-the-art LLMs to generate a graph-structural prompt to enhance the pre-trained Graph Neural Networks (GNNs), which brings us new methodological insights into the KG inductive reasoning methods, as well as high generalizability in practice. On the methodological side, we introduce a novel pretraining and prompting framework ProLINK, designed for low-resource inductive reasoning across arbitrary KGs without requiring additional training. On the practical side, we experimentally evaluate our approach on 36 low-resource KG datasets and find that ProLINK outperforms previous methods in three-shot, one-shot, and zero-shot reasoning tasks, exhibiting average performance improvements by 20%, 45%, and 147%, respectively. Furthermore, ProLINK demonstrates strong robustness for various LLM promptings as well as full-shot scenarios.
- [291] arXiv:2402.11893 [ pdf , ps , other ]
-
Title: Discerning and Resolving Knowledge Conflicts through Adaptive Decoding with Contextual Information-Entropy ConstraintSubjects: Artificial Intelligence (cs.AI)
Abstract: Large language models internalize enormous parametric knowledge during pre-training. Concurrently, realistic applications necessitate external contextual knowledge to aid models on the underlying tasks. This raises a crucial dilemma known as knowledge conflicts, where the contextual knowledge clashes with the However, existing decoding works are specialized in resolving knowledge conflicts and could inadvertently deteriorate performance in absence of conflicts. In this paper, we propose an adaptive decoding method, termed as contextual information-entropy constraint decoding (COIECD), to discern whether the knowledge conflicts occur and resolve them. It can improve the model's faithfulness to conflicting context, and simultaneously maintain high performance among non- Our experiments show that COIECD exhibits strong performance and robustness over knowledge conflicts in realistic datasets. Code is available.
- [292] arXiv:2402.11901 [ pdf , ps , html , other ]
-
Title: Real-World Planning with PDDL+ and BeyondSubjects: Artificial Intelligence (cs.AI)
Abstract: Real-world applications of AI Planning often require a highly expressive modeling language to accurately capture important intricacies of target systems. Hybrid systems are ubiquitous in the real-world, and PDDL+ is the standardized modeling language for capturing such systems as planning domains. PDDL+ enables accurate encoding of mixed discrete-continuous system dynamics, exogenous activity, and many other interesting features exhibited in realistic scenarios. However, the uptake in usage of PDDL+ has been slow and apprehensive, largely due to a general shortage of PDDL+ planning software, and rigid limitations of the few existing planners. To overcome this chasm, we present Nyx, a novel PDDL+ planner built to emphasize lightness, simplicity, and, most importantly, adaptability. The planner is designed to be effortlessly customizable to expand its capabilities well beyond the scope of PDDL+. As a result, Nyx can be tailored to virtually any potential real-world application requiring some form of AI Planning, paving the way for wider adoption of planning methods for solving real-world problems.
- [293] arXiv:2402.12001 [ pdf , ps , html , other ]
-
Title: A Survey on Extractive Knowledge Graph Summarization: Applications, Approaches, Evaluation, and Future DirectionsComments: 9 pages, 13 figures, submitted to the IJCAI 2024 Survey TrackSubjects: Artificial Intelligence (cs.AI) ; Databases (cs.DB); Information Retrieval (cs.IR); Social and Information Networks (cs.SI)
Abstract: With the continuous growth of large Knowledge Graphs (KGs), extractive KG summarization becomes a trending task. Aiming at distilling a compact subgraph with condensed information, it facilitates various downstream KG-based tasks. In this survey paper, we are among the first to provide a systematic overview of its applications and define a taxonomy for existing methods from its interdisciplinary studies. Future directions are also laid out based on our extensive and comparative review.
- [294] arXiv:2402.12074 [ pdf , ps , other ]
-
Title: HIP Network: Historical Information Passing Network for Extrapolation Reasoning on Temporal Knowledge GraphComments: 7 pages, 3 figuresJournal-ref: IJCAI (2021) 1915-1921Subjects: Artificial Intelligence (cs.AI)
Abstract: In recent years, temporal knowledge graph (TKG) reasoning has received significant attention. Most existing methods assume that all timestamps and corresponding graphs are available during training, which makes it difficult to predict future events. To address this issue, recent works learn to infer future events based on historical information. However, these methods do not comprehensively consider the latent patterns behind temporal changes, to pass historical information selectively, update representations appropriately and predict events accurately. In this paper, we propose the Historical Information Passing (HIP) network to predict future events. HIP network passes information from temporal, structural and repetitive perspectives, which are used to model the temporal evolution of events, the interactions of events at the same time step, and the known events respectively. In particular, our method considers the updating of relation representations and adopts three scoring functions corresponding to the above dimensions. Experimental results on five benchmark datasets show the superiority of HIP network, and the significant improvements on Hits@1 prove that our method can more accurately predict what is going to happen.
- [295] arXiv:2402.12132 [ pdf , ps , other ]
-
Title: SSTKG: Simple Spatio-Temporal Knowledge Graph for Intepretable and Versatile Dynamic Information EmbeddingComments: for Web conf 2024. 8 pages contextSubjects: Artificial Intelligence (cs.AI)
Abstract: Knowledge graphs (KGs) have been increasingly employed for link prediction and recommendation using real-world datasets. However, the majority of current methods rely on static data, neglecting the dynamic nature and the hidden spatio-temporal attributes of real-world scenarios. This often results in suboptimal predictions and recommendations. Although there are effective spatio-temporal inference methods, they face challenges such as scalability with large datasets and inadequate semantic understanding, which impede their performance. To address these limitations, this paper introduces a novel framework - Simple Spatio-Temporal Knowledge Graph (SSTKG), for constructing and exploring spatio-temporal KGs. To integrate spatial and temporal data into KGs, our framework exploited through a new 3-step embedding method. Output embeddings can be used for future temporal sequence prediction and spatial information recommendation, providing valuable insights for various applications such as retail sales forecasting and traffic volume prediction. Our framework offers a simple but comprehensive way to understand the underlying patterns and trends in dynamic KG, thereby enhancing the accuracy of predictions and the relevance of recommendations. This work paves the way for more effective utilization of spatio-temporal data in KGs, with potential impacts across a wide range of sectors.
- [296] arXiv:2402.12183 [ pdf , ps , other ]
-
Title: MultiFIX: An XAI-friendly feature inducing approach to building models from multimodal dataComments: 8 pages, 9 figuresSubjects: Artificial Intelligence (cs.AI)
Abstract: In the health domain, decisions are often based on different data modalities. Thus, when creating prediction models, multimodal fusion approaches that can extract and combine relevant features from different data modalities, can be highly beneficial. Furthermore, it is important to understand how each modality impacts the final prediction, especially in high-stake domains, so that these models can be used in a trustworthy and responsible manner. We propose MultiFIX: a new interpretability-focused multimodal data fusion pipeline that explicitly induces separate features from different data types that can subsequently be combined to make a final prediction. An end-to-end deep learning architecture is used to train a predictive model and extract representative features of each modality. Each part of the model is then explained using explainable artificial intelligence techniques. Attention maps are used to highlight important regions in image inputs. Inherently interpretable symbolic expressions, learned with GP-GOMEA, are used to describe the contribution of tabular inputs. The fusion of the extracted features to predict the target label is also replaced by a symbolic expression, learned with GP-GOMEA. Results on synthetic problems demonstrate the strengths and limitations of MultiFIX. Lastly, we apply MultiFIX to a publicly available dataset for the detection of malignant skin lesions.
- [297] arXiv:2402.12275 [ pdf , ps , other ]
-
Title: WorldCoder, a Model-Based LLM Agent: Building World Models by Writing Code and Interacting with the EnvironmentSubjects: Artificial Intelligence (cs.AI) ; Computation and Language (cs.CL)
Abstract: We give a model-based agent that builds a Python program representing its knowledge of the world based on its interactions with the environment. The world model tries to explain its interactions, while also being optimistic about what reward it can achieve. We do this by extending work on program synthesis via LLMs. We study our agent on gridworlds, finding our approach is more sample-efficient compared to deep RL, and more compute-efficient compared to ReAct-style agents.
- [298] arXiv:2402.12327 [ pdf , ps , other ]
-
Title: Shall We Talk: Exploring Spontaneous Collaborations of Competing LLM AgentsZengqing Wu , Shuyuan Zheng , Qianying Liu , Xu Han , Brian Inhyuk Kwon , Makoto Onizuka , Shaojie Tang , Run Peng , Chuan XiaoComments: Source codes available at this https URLSubjects: Artificial Intelligence (cs.AI) ; Computation and Language (cs.CL); Computers and Society (cs.CY); Multiagent Systems (cs.MA); General Economics (econ.GN)
Abstract: Recent advancements have shown that agents powered by large language models (LLMs) possess capabilities to simulate human behaviors and societal dynamics. However, the potential for LLM agents to spontaneously establish collaborative relationships in the absence of explicit instructions has not been studied. To address this gap, we conduct three case studies, revealing that LLM agents are capable of spontaneously forming collaborations even within competitive settings. This finding not only demonstrates the capacity of LLM agents to mimic competition and cooperation in human societies but also validates a promising vision of computational social science. Specifically, it suggests that LLM agents could be utilized to model human social interactions, including those with spontaneous collaborations, thus offering insights into social phenomena. The source codes for this study are available at this https URL .
- [299] arXiv:2402.12381 [ pdf , ps , html , other ]
-
Title: Constrained Multi-objective Optimization with Deep Reinforcement Learning Assisted Operator SelectionSubjects: Artificial Intelligence (cs.AI) ; Neural and Evolutionary Computing (cs.NE)
Abstract: Solving constrained multi-objective optimization problems with evolutionary algorithms has attracted considerable attention. Various constrained multi-objective optimization evolutionary algorithms (CMOEAs) have been developed with the use of different algorithmic strategies, evolutionary operators, and constraint-handling techniques. The performance of CMOEAs may be heavily dependent on the operators used, however, it is usually difficult to select suitable operators for the problem at hand. Hence, improving operator selection is promising and necessary for CMOEAs. This work proposes an online operator selection framework assisted by Deep Reinforcement Learning. The dynamics of the population, including convergence, diversity, and feasibility, are regarded as the state; the candidate operators are considered as actions; and the improvement of the population state is treated as the reward. By using a Q-Network to learn a policy to estimate the Q-values of all actions, the proposed approach can adaptively select an operator that maximizes the improvement of the population according to the current state and thereby improve the algorithmic performance. The framework is embedded into four popular CMOEAs and assessed on 42 benchmark problems. The experimental results reveal that the proposed Deep Reinforcement Learning-assisted operator selection significantly improves the performance of these CMOEAs and the resulting algorithm obtains better versatility compared to nine state-of-the-art CMOEAs.
- [300] arXiv:2402.12422 [ pdf , ps , html , other ]
-
Title: Simulacra as Conscious ExoticaSubjects: Artificial Intelligence (cs.AI)
Abstract: The advent of conversational agents with increasingly human-like behaviour throws old philosophical questions into new light. Does it, or could it, ever make sense to speak of AI agents built out of generative language models in terms of consciousness, given that they are "mere" simulacra of human behaviour, and that what they do can be seen as "merely" role play? Drawing on the later writings of Wittgenstein, this paper attempts to tackle this question while avoiding the pitfalls of dualistic thinking.
- [301] arXiv:2402.12608 [ pdf , ps , html , other ]
-
Title: Patient-Centric Knowledge Graphs: A Survey of Current Methods, Challenges, and ApplicationsHassan S. Al Khatib , Subash Neupane , Harish Kumar Manchukonda , Noorbakhsh Amiri Golilarz , Sudip Mittal , Amin Amirlatifi , Shahram RahimiSubjects: Artificial Intelligence (cs.AI)
Abstract: Patient-Centric Knowledge Graphs (PCKGs) represent an important shift in healthcare that focuses on individualized patient care by mapping the patient's health information in a holistic and multi-dimensional way. PCKGs integrate various types of health data to provide healthcare professionals with a comprehensive understanding of a patient's health, enabling more personalized and effective care. This literature review explores the methodologies, challenges, and opportunities associated with PCKGs, focusing on their role in integrating disparate healthcare data and enhancing patient care through a unified health perspective. In addition, this review also discusses the complexities of PCKG development, including ontology design, data integration techniques, knowledge extraction, and structured representation of knowledge. It highlights advanced techniques such as reasoning, semantic search, and inference mechanisms essential in constructing and evaluating PCKGs for actionable healthcare insights. We further explore the practical applications of PCKGs in personalized medicine, emphasizing their significance in improving disease prediction and formulating effective treatment plans. Overall, this review provides a foundational perspective on the current state-of-the-art and best practices of PCKGs, guiding future research and applications in this dynamic field.
- [302] arXiv:2402.12685 [ pdf , ps , html , other ]
-
Title: XRL-Bench: A Benchmark for Evaluating and Comparing Explainable Reinforcement Learning TechniquesYu Xiong , Zhipeng Hu , Ye Huang , Runze Wu , Kai Guan , Xingchen Fang , Ji Jiang , Tianze Zhou , Yujing Hu , Haoyu Liu , Tangjie Lyu , Changjie FanComments: 10 pages, 5 figuresSubjects: Artificial Intelligence (cs.AI)
Abstract: Reinforcement Learning (RL) has demonstrated substantial potential across diverse fields, yet understanding its decision-making process, especially in real-world scenarios where rationality and safety are paramount, is an ongoing challenge. This paper delves in to Explainable RL (XRL), a subfield of Explainable AI (XAI) aimed at unravelling the complexities of RL models. Our focus rests on state-explaining techniques, a crucial subset within XRL methods, as they reveal the underlying factors influencing an agent's actions at any given time. Despite their significant role, the lack of a unified evaluation framework hinders assessment of their accuracy and effectiveness. To address this, we introduce XRL-Bench, a unified standardized benchmark tailored for the evaluation and comparison of XRL methods, encompassing three main modules: standard RL environments, explainers based on state importance, and standard evaluators. XRL-Bench supports both tabular and image data for state explanation. We also propose TabularSHAP, an innovative and competitive XRL method. We demonstrate the practical utility of TabularSHAP in real-world online gaming services and offer an open-source benchmark platform for the straightforward implementation and evaluation of XRL methods. Our contributions facilitate the continued progression of XRL technology.
- [303] arXiv:2402.12702 [ pdf , ps , html , other ]
-
Title: From Cloud to Edge: Rethinking Generative AI for Low-Resource Design ChallengesComments: Accepted for the Artificial Intelligence for Design Problems bridge program at AAAI 2024Subjects: Artificial Intelligence (cs.AI) ; Computers and Society (cs.CY)
Abstract: Generative Artificial Intelligence (AI) has shown tremendous prospects in all aspects of technology, including design. However, due to its heavy demand on resources, it is usually trained on large computing infrastructure and often made available as a cloud-based service. In this position paper, we consider the potential, challenges, and promising approaches for generative AI for design on the edge, i.e., in resource-constrained settings where memory, compute, energy (battery) and network connectivity may be limited. Adapting generative AI for such settings involves overcoming significant hurdles, primarily in how to streamline complex models to function efficiently in low-resource environments. This necessitates innovative approaches in model compression, efficient algorithmic design, and perhaps even leveraging edge computing. The objective is to harness the power of generative AI in creating bespoke solutions for design problems, such as medical interventions, farm equipment maintenance, and educational material design, tailored to the unique constraints and needs of remote areas. These efforts could democratize access to advanced technology and foster sustainable development, ensuring universal accessibility and environmental consideration of AI-driven design benefits.
- [304] arXiv:2402.12845 [ pdf , ps , html , other ]
-
Title: MORE-3S:Multimodal-based Offline Reinforcement Learning with Shared Semantic SpacesSubjects: Artificial Intelligence (cs.AI) ; Computer Science and Game Theory (cs.GT)
Abstract: Drawing upon the intuition that aligning different modalities to the same semantic embedding space would allow models to understand states and actions more easily, we propose a new perspective to the offline reinforcement learning (RL) challenge. More concretely, we transform it into a supervised learning task by integrating multimodal and pre-trained language models. Our approach incorporates state information derived from images and action-related data obtained from text, thereby bolstering RL training performance and promoting long-term strategic thinking. We emphasize the contextual understanding of language and demonstrate how decision-making in RL can benefit from aligning states' and actions' representation with languages' representation. Our method significantly outperforms current baselines as evidenced by evaluations conducted on Atari and OpenAI Gym environments. This contributes to advancing offline RL performance and efficiency while providing a novel perspective on offline RL.Our code and data are available at this https URL .
- [305] arXiv:2402.12887 [ pdf , ps , html , other ]
-
Title: The practice of qualitative parameterisation in the development of Bayesian networksComments: 6 pages, 2 figures, technical noteSubjects: Artificial Intelligence (cs.AI)
Abstract: The typical phases of Bayesian network (BN) structured development include specification of purpose and scope, structure development, parameterisation and validation. Structure development is typically focused on qualitative issues and parameterisation quantitative issues, however there are qualitative and quantitative issues that arise in both phases. A common step that occurs after the initial structure has been developed is to perform a rough parameterisation that only captures and illustrates the intended qualitative behaviour of the model. This is done prior to a more rigorous parameterisation, ensuring that the structure is fit for purpose, as well as supporting later development and validation. In our collective experience and in discussions with other modellers, this step is an important part of the development process, but is under-reported in the literature. Since the practice focuses on qualitative issues, despite being quantitative in nature, we call this step qualitative parameterisation and provide an outline of its role in the BN development process.
- [306] arXiv:2402.12907 [ pdf , ps , html , other ]
-
Title: Incentive Compatibility for AI Alignment in Sociotechnical Systems: Positions and ProspectsComments: 13 pages, 2 figuresSubjects: Artificial Intelligence (cs.AI) ; Computers and Society (cs.CY); Computer Science and Game Theory (cs.GT); Human-Computer Interaction (cs.HC)
Abstract: The burgeoning integration of artificial intelligence (AI) into human society brings forth significant implications for societal governance and safety. While considerable strides have been made in addressing AI alignment challenges, existing methodologies primarily focus on technical facets, often neglecting the intricate sociotechnical nature of AI systems, which can lead to a misalignment between the development and deployment contexts. To this end, we posit a new problem worth exploring: Incentive Compatibility Sociotechnical Alignment Problem (ICSAP). We hope this can call for more researchers to explore how to leverage the principles of Incentive Compatibility (IC) from game theory to bridge the gap between technical and societal components to maintain AI consensus with human societies in different contexts. We further discuss three classical game problems for achieving IC: mechanism design, contract theory, and Bayesian persuasion, in addressing the perspectives, potentials, and challenges of solving ICSAP, and provide preliminary implementation conceptions.
- [307] arXiv:2402.13019 [ pdf , ps , other ]
-
Title: Improving Neural-based Classification with Logical Background KnowledgeComments: 9 pages, 3 figures, submitted to IJCAI 2024Subjects: Artificial Intelligence (cs.AI) ; Machine Learning (cs.LG); Symbolic Computation (cs.SC)
Abstract: Neurosymbolic AI is a growing field of research aiming to combine neural networks learning capabilities with the reasoning abilities of symbolic systems. This hybridization can take many shapes. In this paper, we propose a new formalism for supervised multi-label classification with propositional background knowledge. We introduce a new neurosymbolic technique called semantic conditioning at inference, which only constrains the system during inference while leaving the training unaffected. We discuss its theoritical and practical advantages over two other popular neurosymbolic techniques: semantic conditioning and semantic regularization. We develop a new multi-scale methodology to evaluate how the benefits of a neurosymbolic technique evolve with the scale of the network. We then evaluate experimentally and compare the benefits of all three techniques across model scales on several datasets. Our results demonstrate that semantic conditioning at inference can be used to build more accurate neural-based systems with fewer resources while guaranteeing the semantic consistency of outputs.
- [308] arXiv:2402.13058 [ pdf , ps , other ]
-
Title: Random Graph Set and Evidence Pattern Reasoning ModelSubjects: Artificial Intelligence (cs.AI)
Abstract: Evidence theory is widely used in decision-making and reasoning systems. In previous research, Transferable Belief Model (TBM) is a commonly used evidential decision making model, but TBM is a non-preference model. In order to better fit the decision making goals, the Evidence Pattern Reasoning Model (EPRM) is proposed. By defining pattern operators and decision making operators, corresponding preferences can be set for different tasks. Random Permutation Set (RPS) expands order information for evidence theory. It is hard for RPS to characterize the complex relationship between samples such as cycling, paralleling relationships. Therefore, Random Graph Set (RGS) were proposed to model complex relationships and represent more event types. In order to illustrate the significance of RGS and EPRM, an experiment of aircraft velocity ranking was designed and 10,000 cases were simulated. The implementation of EPRM called Conflict Resolution Decision optimized 18.17\% of the cases compared to Mean Velocity Decision, effectively improving the aircraft velocity ranking. EPRM provides a unified solution for evidence-based decision making.
- [309] arXiv:2402.13219 [ pdf , ps , html , other ]
-
Title: Analyzing Operator States and the Impact of AI-Enhanced Decision Support in Control Rooms: A Human-in-the-Loop Specialized Reinforcement Learning Framework for Intervention StrategiesAmmar N. Abbas , Chidera W. Amazu , Joseph Mietkiewicz , Houda Briwa , Andres Alonzo Perez , Gabriele Baldissone , Micaela Demichela , Georgios G. Chasparis , John D. Kelleher , Maria Chiara LevaSubjects: Artificial Intelligence (cs.AI) ; Human-Computer Interaction (cs.HC); Machine Learning (cs.LG); Multiagent Systems (cs.MA); Systems and Control (eess.SY)
Abstract: In complex industrial and chemical process control rooms, effective decision-making is crucial for safety and efficiency. The experiments in this paper evaluate the impact and applications of an AI-based decision support system integrated into an improved human-machine interface, using dynamic influence diagrams, a hidden Markov model, and deep reinforcement learning. The enhanced support system aims to reduce operator workload, improve situational awareness, and provide different intervention strategies to the operator adapted to the current state of both the system and human performance. Such a system can be particularly useful in cases of information overload when many alarms and inputs are presented all within the same time window, or for junior operators during training. A comprehensive cross-data analysis was conducted, involving 47 participants and a diverse range of data sources such as smartwatch metrics, eye-tracking data, process logs, and responses from questionnaires. The results indicate interesting insights regarding the effectiveness of the approach in aiding decision-making, decreasing perceived workload, and increasing situational awareness for the scenarios considered. Additionally, the results provide valuable insights to compare differences between styles of information gathering when using the system by individual participants. These findings are particularly relevant when predicting the overall performance of the individual participant and their capacity to successfully handle a plant upset and the alarms connected to it using process and human-machine interaction logs in real-time. These predictions enable the development of more effective intervention strategies.
- [310] arXiv:2402.13264 [ pdf , ps , html , other ]
-
Title: KGroot: Enhancing Root Cause Analysis through Knowledge Graphs and Graph Convolutional Neural NetworksSubjects: Artificial Intelligence (cs.AI)
Abstract: Fault localization is challenging in online micro-service due to the wide variety of monitoring data volume, types, events and complex interdependencies in service and components. Faults events in services are propagative and can trigger a cascade of alerts in a short period of time. In the industry, fault localization is typically conducted manually by experienced personnel. This reliance on experience is unreliable and lacks automation. Different modules present information barriers during manual localization, making it difficult to quickly align during urgent faults. This inefficiency lags stability assurance to minimize fault detection and repair time. Though actionable methods aimed to automatic the process, the accuracy and efficiency are less than satisfactory. The precision of fault localization results is of paramount importance as it underpins engineers trust in the diagnostic conclusions, which are derived from multiple perspectives and offer comprehensive insights. Therefore, a more reliable method is required to automatically identify the associative relationships among fault events and propagation path. To achieve this, KGroot uses event knowledge and the correlation between events to perform root cause reasoning by integrating knowledge graphs and GCNs for RCA. FEKG is built based on historical data, an online graph is constructed in real-time when a failure event occurs, and the similarity between each knowledge graph and online graph is compared using GCNs to pinpoint the fault type through a ranking strategy. Comprehensive experiments demonstrate KGroot can locate the root cause with accuracy of 93.5% top 3 potential causes in second-level. This performance matches the level of real-time fault diagnosis in the industrial environment and significantly surpasses state-of-the-art baselines in RCA in terms of effectiveness and efficiency.
- [311] arXiv:2402.13272 [ pdf , ps , html , other ]
-
Title: Spontaneous Theory of Mind for Artificial IntelligenceSubjects: Artificial Intelligence (cs.AI) ; Human-Computer Interaction (cs.HC)
Abstract: Existing approaches to Theory of Mind (ToM) in Artificial Intelligence (AI) overemphasize prompted, or cue-based, ToM, which may limit our collective ability to develop Artificial Social Intelligence (ASI). Drawing from research in computer science, cognitive science, and related disciplines, we contrast prompted ToM with what we call spontaneous ToM -- reasoning about others' mental states that is grounded in unintentional, possibly uncontrollable cognitive functions. We argue for a principled approach to studying and developing AI ToM and suggest that a robust, or general, ASI will respond to prompts \textit{and} spontaneously engage in social reasoning.
- [312] arXiv:2402.13273 [ pdf , ps , html , other ]
-
Title: Operational Collective Intelligence of Humans and MachinesSubjects: Artificial Intelligence (cs.AI) ; Human-Computer Interaction (cs.HC)
Abstract: We explore the use of aggregative crowdsourced forecasting (ACF) as a mechanism to help operationalize ``collective intelligence'' of human-machine teams for coordinated actions. We adopt the definition for Collective Intelligence as: ``A property of groups that emerges from synergies among data-information-knowledge, software-hardware, and individuals (those with new insights as well as recognized authorities) that enables just-in-time knowledge for better decisions than these three elements acting alone.'' Collective Intelligence emerges from new ways of connecting humans and AI to enable decision-advantage, in part by creating and leveraging additional sources of information that might otherwise not be included. Aggregative crowdsourced forecasting (ACF) is a recent key advancement towards Collective Intelligence wherein predictions (X\% probability that Y will happen) and rationales (why I believe it is this probability that X will happen) are elicited independently from a diverse crowd, aggregated, and then used to inform higher-level decision-making. This research asks whether ACF, as a key way to enable Operational Collective Intelligence, could be brought to bear on operational scenarios (i.e., sequences of events with defined agents, components, and interactions) and decision-making, and considers whether such a capability could provide novel operational capabilities to enable new forms of decision-advantage.
- [313] arXiv:2402.13290 [ pdf , ps , html , other ]
-
Title: Grounding from an AI and Cognitive Science LensJournal-ref: IEEE Intelligent Systems, 2024Subjects: Artificial Intelligence (cs.AI)
Abstract: Grounding is a challenging problem, requiring a formal definition and different levels of abstraction. This article explores grounding from both cognitive science and machine learning perspectives. It identifies the subtleties of grounding, its significance for collaborative agents, and similarities and differences in grounding approaches in both communities. The article examines the potential of neuro-symbolic approaches tailored for grounding tasks, showcasing how they can more comprehensively address grounding. Finally, we discuss areas for further exploration and development in grounding.
- [314] arXiv:2402.13380 [ pdf , ps , html , other ]
-
Title: Toward TransfORmers: Revolutionizing the Solution of Mixed Integer Programs with TransformersSubjects: Artificial Intelligence (cs.AI) ; Machine Learning (cs.LG); Combinatorics (math.CO); Optimization and Control (math.OC); Machine Learning (stat.ML)
Abstract: In this study, we introduce an innovative deep learning framework that employs a transformer model to address the challenges of mixed-integer programs, specifically focusing on the Capacitated Lot Sizing Problem (CLSP). Our approach, to our knowledge, is the first to utilize transformers to predict the binary variables of a mixed-integer programming (MIP) problem. Specifically, our approach harnesses the encoder decoder transformer's ability to process sequential data, making it well-suited for predicting binary variables indicating production setup decisions in each period of the CLSP. This problem is inherently dynamic, and we need to handle sequential decision making under constraints. We present an efficient algorithm in which CLSP solutions are learned through a transformer neural network. The proposed post-processed transformer algorithm surpasses the state-of-the-art solver, CPLEX and Long Short-Term Memory (LSTM) in solution time, optimal gap, and percent infeasibility over 240K benchmark CLSP instances tested. After the ML model is trained, conducting inference on the model, including post-processing, reduces the MIP into a linear program (LP). This transforms the ML-based algorithm, combined with an LP solver, into a polynomial-time approximation algorithm to solve a well-known NP-Hard problem, with almost perfect solution quality.
- [315] arXiv:2402.13399 [ pdf , ps , html , other ]
-
Title: Learning and Sustaining Shared Normative Systems via Bayesian Rule Induction in Markov GamesComments: Accepted to the 23rd International Conference on Autonomous Agents and Multi-Agent Systems, 8 pages (excl. references), 6 figures/tables, (Appendix: 7 pages, 6 figures/tables). Code available at: this https URLSubjects: Artificial Intelligence (cs.AI)
Abstract: A universal feature of human societies is the adoption of systems of rules and norms in the service of cooperative ends. How can we build learning agents that do the same, so that they may flexibly cooperate with the human institutions they are embedded in? We hypothesize that agents can achieve this by assuming there exists a shared set of norms that most others comply with while pursuing their individual desires, even if they do not know the exact content of those norms. By assuming shared norms, a newly introduced agent can infer the norms of an existing population from observations of compliance and violation. Furthermore, groups of agents can converge to a shared set of norms, even if they initially diverge in their beliefs about what the norms are. This in turn enables the stability of the normative system: since agents can bootstrap common knowledge of the norms, this leads the norms to be widely adhered to, enabling new entrants to rapidly learn those norms. We formalize this framework in the context of Markov games and demonstrate its operation in a multi-agent environment via approximately Bayesian rule induction of obligative and prohibitive norms. Using our approach, agents are able to rapidly learn and sustain a variety of cooperative institutions, including resource management norms and compensation for pro-social labor, promoting collective welfare while still allowing agents to act in their own interests.
- [316] arXiv:2402.13419 [ pdf , ps , html , other ]
-
Title: Reward Bound for Behavioral Guarantee of Model-based Planning AgentsComments: To be published in ICLR 24 tiny paper trackSubjects: Artificial Intelligence (cs.AI)
Abstract: Recent years have seen an emerging interest in the trustworthiness of machine learning-based agents in the wild, especially in robotics, to provide safety assurance for the industry. Obtaining behavioral guarantees for these agents remains an important problem. In this work, we focus on guaranteeing a model-based planning agent reaches a goal state within a specific future time step. We show that there exists a lower bound for the reward at the goal state, such that if the said reward is below that bound, it is impossible to obtain such a guarantee. By extension, we show how to enforce preferences over multiple goals.
- [317] arXiv:2402.13427 [ pdf , ps , other ]
-
Title: Quantitative causality, causality-guided scientific discovery, and causal machine learningComments: 10 pages, 3 figures. To appear in Ocean-Land-Atmosphere Research. arXiv admin note: substantial text overlap with arXiv:2112.14839Subjects: Artificial Intelligence (cs.AI) ; Data Analysis, Statistics and Probability (physics.data-an)
Abstract: It has been said, arguably, that causality analysis should pave a promising way to interpretable deep learning and generalization. Incorporation of causality into artificial intelligence (AI) algorithms, however, is challenged with its vagueness, non-quantitiveness, computational inefficiency, etc. During the past 18 years, these challenges have been essentially resolved, with the establishment of a rigorous formalism of causality analysis initially motivated from atmospheric predictability. This not only opens a new field in the atmosphere-ocean science, namely, information flow, but also has led to scientific discoveries in other disciplines, such as quantum mechanics, neuroscience, financial economics, etc., through various applications. This note provides a brief review of the decade-long effort, including a list of major theoretical results, a sketch of the causal deep learning framework, and some representative real-world applications in geoscience pertaining to this journal, such as those on the anthropogenic cause of global warming, the decadal prediction of El Niño Modoki, the forecasting of an extreme drought in China, among others.
- [318] arXiv:2402.13440 [ pdf , ps , html , other ]
-
Title: A Neuro-Symbolic Approach to Multi-Agent RL for Interpretability and Probabilistic Decision MakingChitra Subramanian , Miao Liu , Naweed Khan , Jonathan Lenchner , Aporva Amarnath , Sarathkrishna Swaminathan , Ryan Riegel , Alexander GraySubjects: Artificial Intelligence (cs.AI) ; Neural and Evolutionary Computing (cs.NE)
Abstract: Multi-agent reinforcement learning (MARL) is well-suited for runtime decision-making in optimizing the performance of systems where multiple agents coexist and compete for shared resources. However, applying common deep learning-based MARL solutions to real-world problems suffers from issues of interpretability, sample efficiency, partial observability, etc. To address these challenges, we present an event-driven formulation, where decision-making is handled by distributed co-operative MARL agents using neuro-symbolic methods. The recently introduced neuro-symbolic Logical Neural Networks (LNN) framework serves as a function approximator for the RL, to train a rules-based policy that is both logical and interpretable by construction. To enable decision-making under uncertainty and partial observability, we developed a novel probabilistic neuro-symbolic framework, Probabilistic Logical Neural Networks (PLNN), which combines the capabilities of logical reasoning with probabilistic graphical models. In PLNN, the upward/downward inference strategy, inherited from LNN, is coupled with belief bounds by setting the activation function for the logical operator associated with each neural network node to a probability-respecting generalization of the Fréchet inequalities. These PLNN nodes form the unifying element that combines probabilistic logic and Bayes Nets, permitting inference for variables with unobserved states. We demonstrate our contributions by addressing key MARL challenges for power sharing in a system-on-chip application.
- [319] arXiv:2402.13582 [ pdf , ps , other ]
-
Title: Mastering the Game of Guandan with Deep Reinforcement Learning and Behavior RegulatingSubjects: Artificial Intelligence (cs.AI) ; Machine Learning (cs.LG)
Abstract: Games are a simplified model of reality and often serve as a favored platform for Artificial Intelligence (AI) research. Much of the research is concerned with game-playing agents and their decision making processes. The game of Guandan (literally, "throwing eggs") is a challenging game where even professional human players struggle to make the right decision at times. In this paper we propose a framework named GuanZero for AI agents to master this game using Monte-Carlo methods and deep neural networks. The main contribution of this paper is about regulating agents' behavior through a carefully designed neural network encoding scheme. We then demonstrate the effectiveness of the proposed framework by comparing it with state-of-the-art approaches.
- [320] arXiv:2402.13615 [ pdf , ps , html , other ]
-
Title: Analyizing the Conjunction Fallacy as a FactComments: book chapterJournal-ref: In: Veloz, Khrennikov, Toni, Castillo (eds) Trends and Challenges in Cognitive Modeling. STEAM-H: Springer (2023)Subjects: Artificial Intelligence (cs.AI) ; Probability (math.PR); Adaptation and Self-Organizing Systems (nlin.AO)
Abstract: Since the seminal paper by Tversky and Kahneman, the conjunction fallacy has been the subject of multiple debates and become a fundamental challenge for cognitive theories in decision-making. In this article, we take a rather uncommon perspective on this phenomenon. Instead of trying to explain the nature or causes of the conjunction fallacy (intensional definition), we analyze its range of factual possibilities (extensional definition). We show that the majority of research on the conjunction fallacy, according to our sample of experiments reviewed which covers literature between 1983 and 2016, has focused on a narrow part of the a priori factual possibilities, implying that explanations of the conjunction fallacy are fundamentally biased by the short scope of possibilities explored. The latter is a rather curious aspect of the research evolution in the conjunction fallacy considering that the very nature of it is motivated by extensional considerations.
- [321] arXiv:2402.13782 [ pdf , ps , other ]
-
Title: Semirings for Probabilistic and Neuro-Symbolic Logic ProgrammingJournal-ref: International Journal of Approximate Reasoning (2024): 109130Subjects: Artificial Intelligence (cs.AI)
Abstract: The field of probabilistic logic programming (PLP) focuses on integrating probabilistic models into programming languages based on logic. Over the past 30 years, numerous languages and frameworks have been developed for modeling, inference and learning in probabilistic logic programs. While originally PLP focused on discrete probability, more recent approaches have incorporated continuous distributions as well as neural networks, effectively yielding neural-symbolic methods. We provide a unified algebraic perspective on PLP, showing that many if not most of the extensions of PLP can be cast within a common algebraic logic programming framework, in which facts are labeled with elements of a semiring and disjunction and conjunction are replaced by addition and multiplication. This does not only hold for the PLP variations itself but also for the underlying execution mechanism that is based on (algebraic) model counting.
- [322] arXiv:2402.13785 [ pdf , ps , html , other ]
-
Title: Synthesis of Hierarchical Controllers Based on Deep Reinforcement Learning PoliciesComments: 19 pages main text, 17 pages Appendix (excluding references)Subjects: Artificial Intelligence (cs.AI)
Abstract: We propose a novel approach to the problem of controller design for environments modeled as Markov decision processes (MDPs). Specifically, we consider a hierarchical MDP a graph with each vertex populated by an MDP called a "room". We first apply deep reinforcement learning (DRL) to obtain low-level policies for each room, scaling to large rooms of unknown structure. We then apply reactive synthesis to obtain a high-level planner that chooses which low-level policy to execute in each room. The central challenge in synthesizing the planner is the need for modeling rooms. We address this challenge by developing a DRL procedure to train concise "latent" policies together with PAC guarantees on their performance. Unlike previous approaches, ours circumvents a model distillation step. Our approach combats sparse rewards in DRL and enables reusability of low-level policies. We demonstrate feasibility in a case study involving agent navigation amid moving obstacles.
- [323] arXiv:2402.13846 [ pdf , ps , other ]
-
Title: Large Language Models are Advanced AnonymizersSubjects: Artificial Intelligence (cs.AI) ; Computation and Language (cs.CL); Cryptography and Security (cs.CR)
Abstract: Recent work in privacy research on large language models has shown that they achieve near human-level performance at inferring personal data from real-world online texts. With consistently increasing model capabilities, existing text anonymization methods are currently lacking behind regulatory requirements and adversarial threats. This raises the question of how individuals can effectively protect their personal data in sharing online texts. In this work, we take two steps to answer this question: We first present a new setting for evaluating anonymizations in the face of adversarial LLMs inferences, allowing for a natural measurement of anonymization performance while remedying some of the shortcomings of previous metrics. We then present our LLM-based adversarial anonymization framework leveraging the strong inferential capabilities of LLMs to inform our anonymization procedure. In our experimental evaluation, we show on real-world and synthetic online texts how adversarial anonymization outperforms current industry-grade anonymizers both in terms of the resulting utility and privacy.
- [324] arXiv:2402.13914 [ pdf , ps , html , other ]
-
Title: Explain to Question not to JustifySubjects: Artificial Intelligence (cs.AI) ; Cryptography and Security (cs.CR); Machine Learning (cs.LG)
Abstract: Explainable Artificial Intelligence (XAI) is a young but very promising field of research. Unfortunately, the progress in this field is currently slowed down by divergent and incompatible goals. In this paper, we separate various threads tangled within the area of XAI into two complementary cultures of human/value-oriented explanations (BLUE XAI) and model/validation-oriented explanations (RED XAI). We also argue that the area of RED XAI is currently under-explored and hides great opportunities and potential for important research necessary to ensure the safety of AI systems. We conclude this paper by presenting promising challenges in this area.
- [325] arXiv:2402.13927 [ pdf , ps , html , other ]
-
Title: The Delusional Hedge Algorithm as a Model of Human Learning from Diverse OpinionsSubjects: Artificial Intelligence (cs.AI)
Abstract: Whereas cognitive models of learning often assume direct experience with both the features of an event and with a true label or outcome, much of everyday learning arises from hearing the opinions of others, without direct access to either the experience or the ground truth outcome. We consider how people can learn which opinions to trust in such scenarios by extending the hedge algorithm: a classic solution for learning from diverse information sources. We first introduce a semi-supervised variant we call the delusional hedge capable of learning from both supervised and unsupervised experiences. In two experiments, we examine the alignment between human judgments and predictions from the standard hedge, the delusional hedge, and a heuristic baseline model. Results indicate that humans effectively incorporate both labeled and unlabeled information in a manner consistent with the delusional hedge algorithm -- suggesting that human learners not only gauge the accuracy of information sources but also their consistency with other reliable sources. The findings advance our understanding of human learning from diverse opinions, with implications for the development of algorithms that better capture how people learn to weigh conflicting information sources.
- [326] arXiv:2402.14083 [ pdf , ps , html , other ]
-
Title: Beyond A*: Better Planning with Transformers via Search Dynamics BootstrappingLucas Lehnert , Sainbayar Sukhbaatar , DiJia Su , Qinqing Zheng , Paul Mcvay , Michael Rabbat , Yuandong TianSubjects: Artificial Intelligence (cs.AI)
Abstract: While Transformers have enabled tremendous progress in various application settings, such architectures still trail behind traditional symbolic planners for solving complex decision making tasks. In this work, we demonstrate how to train Transformers to solve complex planning tasks. This is accomplished by training an encoder-decoder Transformer model to predict the search dynamics of the $A^*$ search algorithm. We fine tune this model to obtain a Searchformer, a Transformer model that optimally solves previously unseen Sokoban puzzles 93.7% of the time, while using up to 26.8% fewer search steps than the $A^*$ implementation that was used for training initially. In our training method, $A^*$'s search dynamics are expressed as a token sequence outlining when task states are added and removed into the search tree during symbolic planning. Searchformer significantly outperforms baselines that predict the optimal plan directly with a 5-10$\times$ smaller model size and a 10$\times$ smaller training dataset. Lastly, we demonstrate how Searchformer scales to larger and more complex decision making tasks with improved percentage of solved tasks and shortened search dynamics.
- [327] arXiv:2402.14090 [ pdf , ps , html , other ]
-
Title: Social Environment DesignEdwin Zhang , Sadie Zhao , Tonghan Wang , Safwan Hossain , Henry Gasztowtt , Stephan Zheng , David C. Parkes , Milind Tambe , Yiling ChenComments: Code at this https URLSubjects: Artificial Intelligence (cs.AI) ; General Economics (econ.GN); Machine Learning (stat.ML)
Abstract: Artificial Intelligence (AI) holds promise as a technology that can be used to improve government and economic policy-making. This paper proposes a new research agenda towards this end by introducing Social Environment Design, a general framework for the use of AI for automated policy-making that connects with the Reinforcement Learning, EconCS, and Computational Social Choice communities. The framework seeks to capture general economic environments, includes voting on policy objectives, and gives a direction for the systematic analysis of government and economic policy through AI simulation. We highlight key open problems for future research in AI-based policy-making. By solving these challenges, we hope to achieve various social welfare objectives, thereby promoting more ethical and responsible decision making.
- [328] arXiv:2402.14244 [ pdf , ps , html , other ]
-
Title: MENTOR: Guiding Hierarchical Reinforcement Learning with Human Feedback and Dynamic Distance ConstraintSubjects: Artificial Intelligence (cs.AI) ; Human-Computer Interaction (cs.HC); Machine Learning (cs.LG)
Abstract: Hierarchical reinforcement learning (HRL) provides a promising solution for complex tasks with sparse rewards of intelligent agents, which uses a hierarchical framework that divides tasks into subgoals and completes them sequentially. However, current methods struggle to find suitable subgoals for ensuring a stable learning process. Without additional guidance, it is impractical to rely solely on exploration or heuristics methods to determine subgoals in a large goal space. To address the issue, We propose a general hierarchical reinforcement learning framework incorporating human feedback and dynamic distance constraints (MENTOR). MENTOR acts as a "mentor", incorporating human feedback into high-level policy learning, to find better subgoals. As for low-level policy, MENTOR designs a dual policy for exploration-exploitation decoupling respectively to stabilize the training. Furthermore, although humans can simply break down tasks into subgoals to guide the right learning direction, subgoals that are too difficult or too easy can still hinder downstream learning efficiency. We propose the Dynamic Distance Constraint (DDC) mechanism dynamically adjusting the space of optional subgoals. Thus MENTOR can generate subgoals matching the low-level policy learning process from easy to hard. Extensive experiments demonstrate that MENTOR uses a small amount of human feedback to achieve significant improvement in complex tasks with sparse rewards.
- [329] arXiv:2402.14424 [ pdf , ps , html , other ]
-
Title: Automating Psychological Hypothesis Generation with AI: Large Language Models Meet Causal GraphSubjects: Artificial Intelligence (cs.AI) ; Computers and Society (cs.CY)
Abstract: Leveraging the synergy between causal knowledge graphs and a large language model (LLM), our study introduces a groundbreaking approach for computational hypothesis generation in psychology. We analyzed 43,312 psychology articles using a LLM to extract causal relation pairs. This analysis produced a specialized causal graph for psychology. Applying link prediction algorithms, we generated 130 potential psychological hypotheses focusing on `well-being', then compared them against research ideas conceived by doctoral scholars and those produced solely by the LLM. Interestingly, our combined approach of a LLM and causal graphs mirrored the expert-level insights in terms of novelty, clearly surpassing the LLM-only hypotheses (t(59) = 3.34, p=0.007 and t(59) = 4.32, p<0.001, respectively). This alignment was further corroborated using deep semantic analysis. Our results show that combining LLM with machine learning techniques such as causal knowledge graphs can revolutionize automated discovery in psychology, extracting novel insights from the extensive literature. This work stands at the crossroads of psychology and artificial intelligence, championing a new enriched paradigm for data-driven hypothesis generation in psychological research.
- [330] arXiv:2402.14460 [ pdf , ps , html , other ]
-
Title: Reframing the Expected Free Energy: Four Formulations and a UnificationComments: 17 pages, 2 figuresSubjects: Artificial Intelligence (cs.AI)
Abstract: Active inference is a leading theory of perception, learning and decision making, which can be applied to neuroscience, robotics, psychology, and machine learning. Active inference is based on the expected free energy, which is mostly justified by the intuitive plausibility of its formulations, e.g., the risk plus ambiguity and information gain / pragmatic value formulations. This paper seek to formalize the problem of deriving these formulations from a single root expected free energy definition, i.e., the unification problem. Then, we study two settings, each one having its own root expected free energy definition. In the first setting, no justification for the expected free energy has been proposed to date, but all the formulations can be recovered from it. However, in this setting, the agent cannot have arbitrary prior preferences over observations. Indeed, only a limited class of prior preferences over observations is compatible with the likelihood mapping of the generative model. In the second setting, a justification of the root expected free energy definition is known, but this setting only accounts for two formulations, i.e., the risk over states plus ambiguity and entropy plus expected energy formulations.
- [331] arXiv:2402.14580 [ pdf , ps , html , other ]
-
Title: Savvy: Trustworthy Autonomous Vehicles ArchitectureSubjects: Artificial Intelligence (cs.AI) ; Systems and Control (eess.SY)
Abstract: The increasing interest in Autonomous Vehicles (AV) is notable due to business, safety, and performance reasons. While there is salient success in recent AV architectures, hinging on the advancements in AI models, there is a growing number of fatal incidents that impedes full AVs from going mainstream. This calls for the need to revisit the fundamentals of building safety-critical AV architectures. However, this direction should not deter leveraging the power of AI. To this end, we propose Savvy, a new trustworthy intelligent AV architecture that achieves the best of both worlds. Savvy makes a clear separation between the control plane and the data plane to guarantee the safety-first principles. The former assume control to ensure safety using design-time defined rules, while launching the latter for optimizing decisions as much as possible within safety time-bounds. This is achieved through guided Time-aware predictive quality degradation (TPQD): using dynamic ML models that can be tuned to provide either richer or faster outputs based on the available safety time bounds. For instance, Savvy allows to safely identify an elephant as an obstacle (a mere object) the earliest possible, rather than optimally recognizing it as an elephant when it is too late. This position paper presents the Savvy's motivations and concept, whereas empirical evaluation is a work in progress.
- [332] arXiv:2402.14596 [ pdf , ps , other ]
-
Title: The Role of LLMs in Sustainable Smart Cities: Applications, Challenges, and Future DirectionsSubjects: Artificial Intelligence (cs.AI)
Abstract: Smart cities stand as pivotal components in the ongoing pursuit of elevating urban living standards, facilitating the rapid expansion of urban areas while efficiently managing resources through sustainable and scalable innovations. In this regard, as emerging technologies like Artificial Intelligence (AI), the Internet of Things (IoT), big data analytics, and fog and edge computing have become increasingly prevalent, smart city applications grapple with various challenges, including the potential for unauthorized disclosure of confidential and sensitive data. The seamless integration of emerging technologies has played a vital role in sustaining the dynamic pace of their development. This paper explores the substantial potential and applications of Deep Learning (DL), Federated Learning (FL), IoT, Blockchain, Natural Language Processing (NLP), and large language models (LLMs) in optimizing ICT processes within smart cities. We aim to spotlight the vast potential of these technologies as foundational elements that technically strengthen the realization and advancement of smart cities, underscoring their significance in driving innovation within this transformative urban milieu. Our discourse culminates with an exploration of the formidable challenges that DL, FL, IoT, Blockchain, NLP, and LLMs face within these contexts, and we offer insights into potential future directions.
- [333] arXiv:2402.14600 [ pdf , ps , html , other ]
-
Title: Diffusion Model-Based Multiobjective Optimization for Gasoline Blending SchedulingSubjects: Artificial Intelligence (cs.AI)
Abstract: Gasoline blending scheduling uses resource allocation and operation sequencing to meet a refinery's production requirements. The presence of nonlinearity, integer constraints, and a large number of decision variables adds complexity to this problem, posing challenges for traditional and evolutionary algorithms. This paper introduces a novel multiobjective optimization approach driven by a diffusion model (named DMO), which is designed specifically for gasoline blending scheduling. To address integer constraints and generate feasible schedules, the diffusion model creates multiple intermediate distributions between Gaussian noise and the feasible domain. Through iterative processes, the solutions transition from Gaussian noise to feasible schedules while optimizing the objectives using the gradient descent method. DMO achieves simultaneous objective optimization and constraint adherence. Comparative tests are conducted to evaluate DMO's performance across various scales. The experimental results demonstrate that DMO surpasses state-of-the-art multiobjective evolutionary algorithms in terms of efficiency when solving gasoline blending scheduling problems.
- [334] arXiv:2402.14744 [ pdf , ps , html , other ]
-
Title: Large Language Models as Urban Residents: An LLM Agent Framework for Personal Mobility GenerationJiawei Wang , Renhe Jiang , Chuang Yang , Zengqing Wu , Makoto Onizuka , Ryosuke Shibasaki , Chuan XiaoComments: Source codes will be released soonSubjects: Artificial Intelligence (cs.AI) ; Computation and Language (cs.CL); Computers and Society (cs.CY); Machine Learning (cs.LG)
Abstract: This paper introduces a novel approach using Large Language Models (LLMs) integrated into an agent framework for flexible and efficient personal mobility generation. LLMs overcome the limitations of previous models by efficiently processing semantic data and offering versatility in modeling various tasks. Our approach addresses the critical need to align LLMs with real-world urban mobility data, focusing on three research questions: aligning LLMs with rich activity data, developing reliable activity generation strategies, and exploring LLM applications in urban mobility. The key technical contribution is a novel LLM agent framework that accounts for individual activity patterns and motivations, including a self-consistency approach to align LLMs with real-world activity data and a retrieval-augmented strategy for interpretable activity generation. In experimental studies, comprehensive validation is performed using real-world data. This research marks the pioneering work of designing an LLM agent framework for activity generation based on real-world human activity data, offering a promising tool for urban mobility analysis.
- [335] arXiv:2402.14757 [ pdf , ps , html , other ]
-
Title: SHM-Traffic: DRL and Transfer learning based UAV Control for Structural Health Monitoring of Bridges with TrafficSubjects: Artificial Intelligence (cs.AI)
Abstract: This work focuses on using advanced techniques for structural health monitoring (SHM) for bridges with Traffic. We propose an approach using deep reinforcement learning (DRL)-based control for Unmanned Aerial Vehicle (UAV). Our approach conducts a concrete bridge deck survey while traffic is ongoing and detects cracks. The UAV performs the crack detection, and the location of cracks is initially unknown. We use two edge detection techniques. First, we use canny edge detection for crack detection. We also use a Convolutional Neural Network (CNN) for crack detection and compare it with canny edge detection. Transfer learning is applied using CNN with pre-trained weights obtained from a crack image dataset. This enables the model to adapt and improve its performance in identifying and localizing cracks. Proximal Policy Optimization (PPO) is applied for UAV control and bridge surveys. The experimentation across various scenarios is performed to evaluate the performance of the proposed methodology. Key metrics such as task completion time and reward convergence are observed to gauge the effectiveness of the approach. We observe that the Canny edge detector offers up to 40\% lower task completion time, while the CNN excels in up to 12\% better damage detection and 1.8 times better rewards.
- [336] arXiv:2402.15075 [ pdf , ps , other ]
-
Title: Stacking Factorizing Partitioned Expressions in Hybrid Bayesian Network ModelsSubjects: Artificial Intelligence (cs.AI)
Abstract: Hybrid Bayesian networks (HBN) contain complex conditional probabilistic distributions (CPD) specified as partitioned expressions over discrete and continuous variables. The size of these CPDs grows exponentially with the number of parent nodes when using discrete inference, resulting in significant inefficiency. Normally, an effective way to reduce the CPD size is to use a binary factorization (BF) algorithm to decompose the statistical or arithmetic functions in the CPD by factorizing the number of connected parent nodes to sets of size two. However, the BF algorithm was not designed to handle partitioned expressions. Hence, we propose a new algorithm called stacking factorization (SF) to decompose the partitioned expressions. The SF algorithm creates intermediate nodes to incrementally reconstruct the densities in the original partitioned expression, allowing no more than two continuous parent nodes to be connected to each child node in the resulting HBN. SF can be either used independently or combined with the BF algorithm. We show that the SF+BF algorithm significantly reduces the CPD size and contributes to lowering the tree-width of a model, thus improving efficiency.
- [337] arXiv:2402.15140 [ pdf , ps , html , other ]
-
Title: A Relation-Interactive Approach for Message Passing in Hyper-relational Knowledge GraphsSubjects: Artificial Intelligence (cs.AI)
Abstract: Hyper-relational knowledge graphs (KGs) contain additional key-value pairs, providing more information about the relations. In many scenarios, the same relation can have distinct key-value pairs, making the original triple fact more recognizable and specific. Prior studies on hyper-relational KGs have established a solid standard method for hyper-relational graph encoding. In this work, we propose a message-passing-based graph encoder with global relation structure awareness ability, which we call ReSaE. Compared to the prior state-of-the-art approach, ReSaE emphasizes the interaction of relations during message passing process and optimizes the readout structure for link prediction tasks. Overall, ReSaE gives a encoding solution for hyper-relational KGs and ensures stronger performance on downstream link prediction tasks. Our experiments demonstrate that ReSaE achieves state-of-the-art performance on multiple link prediction benchmarks. Furthermore, we also analyze the influence of different model structures on model performance.
- [338] arXiv:2402.15444 [ pdf , ps , html , other ]
-
Title: Unleashing the Power of Imbalanced Modality Information for Multi-modal Knowledge Graph CompletionComments: Accepted by LREC-COLING 2024Subjects: Artificial Intelligence (cs.AI) ; Computation and Language (cs.CL); Machine Learning (cs.LG); Multimedia (cs.MM)
Abstract: Multi-modal knowledge graph completion (MMKGC) aims to predict the missing triples in the multi-modal knowledge graphs by incorporating structural, visual, and textual information of entities into the discriminant models. The information from different modalities will work together to measure the triple plausibility. Existing MMKGC methods overlook the imbalance problem of modality information among entities, resulting in inadequate modal fusion and inefficient utilization of the raw modality information. To address the mentioned problems, we propose Adaptive Multi-modal Fusion and Modality Adversarial Training (AdaMF-MAT) to unleash the power of imbalanced modality information for MMKGC. AdaMF-MAT achieves multi-modal fusion with adaptive modality weights and further generates adversarial samples by modality-adversarial training to enhance the imbalanced modality information. Our approach is a co-design of the MMKGC model and training strategy which can outperform 19 recent MMKGC methods and achieve new state-of-the-art results on three public MMKGC benchmarks. Our code and data have been released at this https URL .
- [339] arXiv:2402.15445 [ pdf , ps , html , other ]
-
Title: Can we forget how we learned? Doxastic redundancy in iterated belief revisionComments: formerly part of arXiv:2305.09200Subjects: Artificial Intelligence (cs.AI)
Abstract: How information was acquired may become irrelevant. An obvious case is when something is confirmed many times. In terms of iterated belief revision, a specific revision may become irrelevant in presence of others. Simple repetitions are an example, but not the only case when this happens. Sometimes, a revision becomes redundant even in presence of none equal, or even no else implying it. A necessary and sufficient condition for the redundancy of the first of a sequence of lexicographic revisions is given. The problem is coNP-complete even with two propositional revisions only. Complexity is the same in the Horn case but only with an unbounded number of revisions: it becomes polynomial with two revisions. Lexicographic revisions are not only relevant by themselves, but also because sequences of them are the most compact of the common mechanisms used to represent the state of an iterated revision process. Shortening sequences of lexicographic revisions is shortening the most compact representations of iterated belief revision states.
- [340] arXiv:2402.15506 [ pdf , ps , html , other ]
-
Title: AgentOhana: Design Unified Data and Training Pipeline for Effective Agent LearningJianguo Zhang , Tian Lan , Rithesh Murthy , Zhiwei Liu , Weiran Yao , Juntao Tan , Thai Hoang , Liangwei Yang , Yihao Feng , Zuxin Liu , Tulika Awalgaonkar , Juan Carlos Niebles , Silvio Savarese , Shelby Heinecke , Huan Wang , Caiming XiongComments: Add GitHub repo link at \url{ this https URL } and HuggingFace model link at \url{ this https URL }Subjects: Artificial Intelligence (cs.AI) ; Computation and Language (cs.CL); Machine Learning (cs.LG)
Abstract: Autonomous agents powered by large language models (LLMs) have garnered significant research attention. However, fully harnessing the potential of LLMs for agent-based tasks presents inherent challenges due to the heterogeneous nature of diverse data sources featuring multi-turn trajectories. In this paper, we introduce \textbf{AgentOhana} as a comprehensive solution to address these challenges. \textit{AgentOhana} aggregates agent trajectories from distinct environments, spanning a wide array of scenarios. It meticulously standardizes and unifies these trajectories into a consistent format, streamlining the creation of a generic data loader optimized for agent training. Leveraging the data unification, our training pipeline maintains equilibrium across different data sources and preserves independent randomness across devices during dataset partitioning and model training. Additionally, we present \textbf{xLAM-v0.1}, a large action model tailored for AI agents, which demonstrates exceptional performance across various benchmarks. Begin the exploration at \url{ this https URL }.
- [341] arXiv:2402.15515 [ pdf , ps , other ]
-
Title: Feasibility of Identifying Factors Related to Alzheimer's Disease and Related Dementia in Real-World DataAokun Chen , Qian Li , Yu Huang , Yongqiu Li , Yu-neng Chuang , Xia Hu , Serena Guo , Yonghui Wu , Yi Guo , Jiang BianSubjects: Artificial Intelligence (cs.AI) ; Quantitative Methods (q-bio.QM); Applications (stat.AP)
Abstract: A comprehensive view of factors associated with AD/ADRD will significantly aid in studies to develop new treatments for AD/ADRD and identify high-risk populations and patients for prevention efforts. In our study, we summarized the risk factors for AD/ADRD by reviewing existing meta-analyses and review articles on risk and preventive factors for AD/ADRD. In total, we extracted 477 risk factors in 10 categories from 537 studies. We constructed an interactive knowledge map to disseminate our study results. Most of the risk factors are accessible from structured Electronic Health Records (EHRs), and clinical narratives show promise as information sources. However, evaluating genomic risk factors using RWD remains a challenge, as genetic testing for AD/ADRD is still not a common practice and is poorly documented in both structured and unstructured EHRs. Considering the constantly evolving research on AD/ADRD risk factors, literature mining via NLP methods offers a solution to automatically update our knowledge map.
- [342] arXiv:2402.15521 [ pdf , ps , html , other ]
-
Title: HKD-SHO: A hybrid smart home system based on knowledge-based and data-driven servicesComments: keywords: Hybrid System, Knowledge Representation, Reinforcement Learning, Services, Smart HomeSubjects: Artificial Intelligence (cs.AI) ; Machine Learning (cs.LG)
Abstract: A smart home is realized by setting up various services. Several methods have been proposed to create smart home services, which can be divided into knowledge-based and data-driven approaches. However, knowledge-based approaches usually require manual input from the inhabitant, which can be complicated if the physical phenomena of the concerned environment states are complex, and the inhabitant does not know how to adjust related actuators to achieve the target values of the states monitored by services. Moreover, machine learning-based data-driven approaches that we are interested in are like black boxes and cannot show the inhabitant in which situations certain services proposed certain actuators' states. To solve these problems, we propose a hybrid system called HKD-SHO (Hybrid Knowledge-based and Data-driven services based Smart HOme system), where knowledge-based and machine learning-based data-driven services are profitably integrated. The principal advantage is that it inherits the explicability of knowledge-based services and the dynamism of data-driven services. We compare HKD-SHO with several systems for creating dynamic smart home services, and the results show the better performance of HKD-SHO.
- [343] arXiv:2402.15522 [ pdf , ps , html , other ]
-
Title: IntSat: Integer Linear Programming by Conflict-Driven Constraint-LearningComments: 48 pages. This is the Author's Original Manuscript of the journal versionSubjects: Artificial Intelligence (cs.AI)
Abstract: State-of-the-art SAT solvers are nowadays able to handle huge real-world instances. The key to this success is the so-called Conflict-Driven Clause-Learning (CDCL) scheme, which encompasses a number of techniques that exploit the conflicts that are encountered during the search for a solution. In this article we extend these techniques to Integer Linear Programming (ILP), where variables may take general integer values instead of purely binary ones, constraints are more expressive than just propositional clauses, and there may be an objective function to optimise. We explain how these methods can be implemented efficiently, and discuss possible improvements. Our work is backed with a basic implementation that shows that, even in this far less mature stage, our techniques are already a useful complement to the state of the art in ILP solving.
- [344] arXiv:2402.15524 [ pdf , ps , html , other ]
-
Title: Graph Pruning for Enumeration of Minimal Unsatisfiable SubsetsSubjects: Artificial Intelligence (cs.AI) ; Machine Learning (cs.LG)
Abstract: Finding Minimal Unsatisfiable Subsets (MUSes) of binary constraints is a common problem in infeasibility analysis of over-constrained systems. However, because of the exponential search space of the problem, enumerating MUSes is extremely time-consuming in real applications. In this work, we propose to prune formulas using a learned model to speed up MUS enumeration. We represent formulas as graphs and then develop a graph-based learning model to predict which part of the formula should be pruned. Importantly, our algorithm does not require data labeling by only checking the satisfiability of pruned formulas. It does not even require training data from the target application because it extrapolates to data with different distributions. In our experiments we combine our algorithm with existing MUS enumerators and validate its effectiveness in multiple benchmarks including a set of real-world problems outside our training distribution. The experiment results show that our method significantly accelerates MUS enumeration on average on these benchmark problems.
- [345] arXiv:2402.15526 [ pdf , ps , html , other ]
-
Title: Chain-of-Specificity: An Iteratively Refining Method for Eliciting Knowledge from Large Language ModelsSubjects: Artificial Intelligence (cs.AI) ; Machine Learning (cs.LG)
Abstract: Large Language Models (LLMs) exhibit remarkable generative capabilities, enabling the generation of valuable information. Despite these advancements, previous research found that LLMs sometimes struggle with adhering to specific constraints (e.g., in specific place or at specific time), at times even overlooking them, which leads to responses that are either too generic or not fully satisfactory. Existing approaches attempted to address this issue by decomposing or rewriting input instructions, yet they fall short in adequately emphasizing specific constraints and in unlocking the underlying knowledge (e.g., programming within the context of software development). In response, this paper proposes a simple yet effective method named Chain-of-Specificity (CoS). Specifically, CoS iteratively emphasizes the specific constraints in the input instructions, unlocks knowledge within LLMs, and refines responses. Experiments conducted on publicly available and self-build complex datasets demonstrate that CoS outperforms existing methods in enhancing generated content especially for the specificity. Besides, as the number of specific constraints increase, other baselines falter, while CoS still performs well. Moreover, we show that distilling responses generated by CoS effectively enhances the ability of smaller models to follow the constrained instructions. Resources of this paper will be released for further research.
- [346] arXiv:2402.15572 [ pdf , ps , html , other ]
-
Title: Improving Explainable Object-induced Model through Uncertainty for Automated VehiclesComments: In Proceedings of the 2024 ACM / IEEE International Conference on Human-Robot Interaction (HRI '24), March 11--14, 2024, Boulder, CO, USA. ACM, New York, NY, USA, 9 pagesSubjects: Artificial Intelligence (cs.AI) ; Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO)
Abstract: The rapid evolution of automated vehicles (AVs) has the potential to provide safer, more efficient, and comfortable travel options. However, these systems face challenges regarding reliability in complex driving scenarios. Recent explainable AV architectures neglect crucial information related to inherent uncertainties while providing explanations for actions. To overcome such challenges, our study builds upon the "object-induced" model approach that prioritizes the role of objects in scenes for decision-making and integrates uncertainty assessment into the decision-making process using an evidential deep learning paradigm with a Beta prior. Additionally, we explore several advanced training strategies guided by uncertainty, including uncertainty-guided data reweighting and augmentation. Leveraging the BDD-OIA dataset, our findings underscore that the model, through these enhancements, not only offers a clearer comprehension of AV decisions and their underlying reasoning but also surpasses existing baselines across a broad range of scenarios.
- [347] arXiv:2402.15670 [ pdf , ps , html , other ]
-
Title: A mathematical model for simultaneous personnel shift planning and unrelated parallel machine schedulingSubjects: Artificial Intelligence (cs.AI) ; Discrete Mathematics (cs.DM)
Abstract: This paper addresses a production scheduling problem derived from an industrial use case, focusing on unrelated parallel machine scheduling with the personnel availability constraint. The proposed model optimizes the production plan over a multi-period scheduling horizon, accommodating variations in personnel shift hours within each time period. It assumes shared personnel among machines, with one personnel required per machine for setup and supervision during job processing. Available personnel are fewer than the machines, thus limiting the number of machines that can operate in parallel. The model aims to minimize the total production time considering machine-dependent processing times and sequence-dependent setup times. The model handles practical scenarios like machine eligibility constraints and production time windows. A Mixed Integer Linear Programming (MILP) model is introduced to formulate the problem, taking into account both continuous and district variables. A two-step solution approach enhances computational speed, first maximizing accepted jobs and then minimizing production time. Validation with synthetic problem instances and a real industrial case study of a food processing plant demonstrates the performance of the model and its usefulness in personnel shift planning. The findings offer valuable insights for practical managerial decision-making in the context of production scheduling.
- [348] arXiv:2402.15721 [ pdf , ps , html , other ]
-
Title: Hal-Eval: A Universal and Fine-grained Hallucination Evaluation Framework for Large Vision Language ModelsChaoya Jiang , Wei Ye , Mengfan Dong , Hongrui Jia , Haiyang Xu , Ming Yan , Ji Zhang , Shikun ZhangSubjects: Artificial Intelligence (cs.AI) ; Computation and Language (cs.CL)
Abstract: Large Vision Language Models exhibit remarkable capabilities but struggle with hallucinations inconsistencies between images and their descriptions. Previous hallucination evaluation studies on LVLMs have identified hallucinations in terms of objects, attributes, and relations but overlooked complex hallucinations that create an entire narrative around a fictional entity. In this paper, we introduce a refined taxonomy of hallucinations, featuring a new category: Event Hallucination. We then utilize advanced LLMs to generate and filter fine grained hallucinatory data consisting of various types of hallucinations, with a particular focus on event hallucinations, laying the groundwork for integrating discriminative and generative evaluation methods within our universal evaluation framework. The proposed benchmark distinctively assesses LVLMs ability to tackle a broad spectrum of hallucinations, making it a reliable and comprehensive tool for gauging LVLMs efficacy in handling hallucinations. We will release our code and data.
- [349] arXiv:2402.15729 [ pdf , ps , html , other ]
-
Title: How Do Humans Write Code? Large Models Do It the Same Way TooSubjects: Artificial Intelligence (cs.AI) ; Computation and Language (cs.CL); Programming Languages (cs.PL)
Abstract: Large Language Models (LLMs) often make errors when performing numerical calculations. In contrast to traditional chain-of-thought reasoning, the program-of-thoughts approach involves generating executable code to solve problems. By executing this code, it achieves more precise results. Using generated executable code instead of natural language can reduce computational errors. However, we observe that when LLMs solve mathematical problems using code, they tend to generate more incorrect reasoning than when using natural language. To address this issue, we propose Human-Think Language (HTL), a straightforward yet highly efficient approach inspired by human coding practices. The approach first generates problem-solving methods described in the natural language by the model, then converts them into code, mirroring the process where people think through the logic in natural language before writing it as code. Additionally, it utilizes the Proximal Policy Optimization (PPO) algorithm, enabling it to provide feedback to itself based on the correctness of mathematical answers, much like humans do. Finally, we introduce a focus-attention mechanism that masks the question segment, enhancing its reliance on natural language inference solutions during code generation. We conduct our experiments without introducing any additional information, and the results across five mathematical calculation datasets showcase the effectiveness of our approach. Notably, on the NumGLUE dataset, the LlaMA-2-7B-based model achieves a superior performance rate (75.1%) compared to the previous best performance with the LlaMA-2-70B model (74.4%).
- [350] arXiv:2402.15796 [ pdf , ps , other ]
-
Title: Construction and application of artificial intelligence crowdsourcing map based on multi-track GPS dataSubjects: Artificial Intelligence (cs.AI) ; Human-Computer Interaction (cs.HC)
Abstract: In recent years, the rapid development of high-precision map technology combined with artificial intelligence has ushered in a new development opportunity in the field of intelligent vehicles. High-precision map technology is an important guarantee for intelligent vehicles to achieve autonomous driving. However, due to the lack of research on high-precision map technology, it is difficult to rationally use this technology in the field of intelligent vehicles. Therefore, relevant researchers studied a fast and effective algorithm to generate high-precision GPS data from a large number of low-precision GPS trajectory data fusion, and generated several key data points to simplify the description of GPS trajectory, and realized the "crowdsourced update" model based on a large number of social vehicles for map data collection came into being. This kind of algorithm has the important significance to improve the data accuracy, reduce the measurement cost and reduce the data storage space. On this basis, this paper analyzes the implementation form of crowdsourcing map, so as to improve the various information data in the high-precision map according to the actual situation, and promote the high-precision map can be reasonably applied to the intelligent car.
- [351] arXiv:2402.15809 [ pdf , ps , other ]
-
Title: Empowering Large Language Model Agents through Action LearningHaiteng Zhao , Chang Ma , Guoyin Wang , Jing Su , Lingpeng Kong , Jingjing Xu , Zhi-Hong Deng , Hongxia YangComments: 9 pagesSubjects: Artificial Intelligence (cs.AI) ; Computation and Language (cs.CL)
Abstract: Large Language Model (LLM) Agents have recently garnered increasing interest yet they are limited in their ability to learn from trial and error, a key element of intelligent behavior. In this work, we argue that the capacity to learn new actions from experience is fundamental to the advancement of learning in LLM agents. While humans naturally expand their action spaces and develop skills through experiential learning, LLM agents typically operate within fixed action spaces, limiting their potential for growth. To address these challenges, our study explores open-action learning for language agents. We introduce a framework LearnAct with an iterative learning strategy to create and improve actions in the form of Python functions. In each iteration, LLM revises and updates the currently available actions based on the errors identified in unsuccessful training tasks, thereby enhancing action effectiveness. Our experimental evaluations across Robotic Planning and Alfworld environments reveal that after learning on a few training task instances, our approach to open-action learning markedly improves agent performance for the type of task (by 32 percent in AlfWorld compared to ReAct+Reflexion, for instance) highlighting the importance of experiential action learning in the development of more intelligent LLM agents.
- [352] arXiv:2402.15929 [ pdf , ps , html , other ]
-
Title: QuaCer-C: Quantitative Certification of Knowledge Comprehension in LLMsSubjects: Artificial Intelligence (cs.AI) ; Computation and Language (cs.CL); Machine Learning (cs.LG)
Abstract: Large Language Models (LLMs) have demonstrated impressive performance on several benchmarks. However, traditional studies do not provide formal guarantees on the performance of LLMs. In this work, we propose a novel certification framework for LLM, QuaCer-C, wherein we formally certify the knowledge-comprehension capabilities of popular LLMs. Our certificates are quantitative - they consist of high-confidence, tight bounds on the probability that the target LLM gives the correct answer on any relevant knowledge comprehension prompt. Our certificates for the Llama, Vicuna, and Mistral LLMs indicate that the knowledge comprehension capability improves with an increase in the number of parameters and that the Mistral model is less performant than the rest in this evaluation.
- [353] arXiv:2402.15960 [ pdf , ps , html , other ]
-
Title: Budget-Constrained Tool Learning with PlanningSubjects: Artificial Intelligence (cs.AI)
Abstract: Despite intensive efforts devoted to tool learning, the problem of budget-constrained tool learning, which focuses on resolving user queries within a specific budget constraint, has been widely overlooked. This paper proposes a novel method for budget-constrained tool learning. Our approach involves creating a preferable plan under the budget constraint before utilizing the tools. This plan outlines the feasible tools and the maximum number of times they can be employed, offering a comprehensive overview of the tool learning process for large language models. This allows them to allocate the budget from a broader perspective. To devise the plan without incurring significant extra costs, we suggest initially estimating the usefulness of the candidate tools based on past experience. Subsequently, we employ dynamic programming to formulate the plan. Experimental results demonstrate that our method can be integrated with various tool learning methods, significantly enhancing their effectiveness under strict budget constraints.
- [354] arXiv:2402.15989 [ pdf , ps , html , other ]
-
Title: PIDformer: Transformer Meets Control TheorySubjects: Artificial Intelligence (cs.AI) ; Systems and Control (eess.SY)
Abstract: In this work, we address two main shortcomings of transformer architectures: input corruption and rank collapse in their output representation. We unveil self-attention as an autonomous state-space model that inherently promotes smoothness in its solutions, leading to lower-rank outputs and diminished representation capacity. Moreover, the steady-state solution of the model is sensitive to input perturbations. We incorporate a Proportional-Integral-Derivative (PID) closed-loop feedback control system with a reference point into the model to improve robustness and representation capacity. This integration aims to preserve high-frequency details while bolstering model stability, rendering it more noise-resilient. The resulting controlled state-space model is theoretically proven robust and adept at addressing the rank collapse. Motivated by this control framework, we derive a novel class of transformers, PID-controlled Transformer (PIDformer), aimed at improving robustness and mitigating the rank-collapse issue inherent in softmax transformers. We empirically evaluate the model for advantages and robustness against baseline transformers across various practical tasks, including object classification, image segmentation, and language modeling.
- [355] arXiv:2402.16269 [ pdf , ps , html , other ]
-
Title: From Large Language Models and Optimization to Decision Optimization CoPilot: A Research ManifestoSegev Wasserkrug , Leonard Boussioux , Dick den Hertog , Farzaneh Mirzazadeh , Ilker Birbil , Jannis Kurtz , Donato MaragnoSubjects: Artificial Intelligence (cs.AI) ; Machine Learning (cs.LG); Optimization and Control (math.OC)
Abstract: Significantly simplifying the creation of optimization models for real-world business problems has long been a major goal in applying mathematical optimization more widely to important business and societal decisions. The recent capabilities of Large Language Models (LLMs) present a timely opportunity to achieve this goal. Therefore, we propose research at the intersection of LLMs and optimization to create a Decision Optimization CoPilot (DOCP) - an AI tool designed to assist any decision maker, interacting in natural language to grasp the business problem, subsequently formulating and solving the corresponding optimization model. This paper outlines our DOCP vision and identifies several fundamental requirements for its implementation. We describe the state of the art through a literature survey and experiments using ChatGPT. We show that a) LLMs already provide substantial novel capabilities relevant to a DOCP, and b) major research challenges remain to be addressed. We also propose possible research directions to overcome these gaps. We also see this work as a call to action to bring together the LLM and optimization communities to pursue our vision, thereby enabling much more widespread improved decision-making.
- [356] arXiv:2402.16278 [ pdf , ps , html , other ]
-
Title: A Self-matching Training Method with Annotation Embedding Models for Ontology Subsumption PredictionComments: 21 pages, 6 figuresSubjects: Artificial Intelligence (cs.AI) ; Computation and Language (cs.CL); Machine Learning (cs.LG)
Abstract: Recently, ontology embeddings representing entities in a low-dimensional space have been proposed for ontology completion. However, the ontology embeddings for concept subsumption prediction do not address the difficulties of similar and isolated entities and fail to extract the global information of annotation axioms from an ontology. In this paper, we propose a self-matching training method for the two ontology embedding models: Inverted-index Matrix Embedding (InME) and Co-occurrence Matrix Embedding (CoME). The two embeddings capture the global and local information in annotation axioms by means of the occurring locations of each word in a set of axioms and the co-occurrences of words in each axiom. The self-matching training method increases the robustness of the concept subsumption prediction when predicted superclasses are similar to subclasses and are isolated to other entities in an ontology. Our evaluation experiments show that the self-matching training method with InME outperforms the existing ontology embeddings for the GO and FoodOn ontologies and that the method with the concatenation of CoME and OWL2Vec* outperforms them for the HeLiS ontology.
- [357] arXiv:2402.16342 [ pdf , ps , html , other ]
-
Title: Contingency Planning Using Bi-level Markov Decision Processes for Space MissionsSubjects: Artificial Intelligence (cs.AI) ; Robotics (cs.RO)
Abstract: This work focuses on autonomous contingency planning for scientific missions by enabling rapid policy computation from any off-nominal point in the state space in the event of a delay or deviation from the nominal mission plan. Successful contingency planning involves managing risks and rewards, often probabilistically associated with actions, in stochastic scenarios. Markov Decision Processes (MDPs) are used to mathematically model decision-making in such scenarios. However, in the specific case of planetary rover traverse planning, the vast action space and long planning time horizon pose computational challenges. A bi-level MDP framework is proposed to improve computational tractability, while also aligning with existing mission planning practices and enhancing explainability and trustworthiness of AI-driven solutions. We discuss the conversion of a mission planning MDP into a bi-level MDP, and test the framework on RoverGridWorld, a modified GridWorld environment for rover mission planning. We demonstrate the computational tractability and near-optimal policies achievable with the bi-level MDP approach, highlighting the trade-offs between compute time and policy optimality as the problem's complexity grows. This work facilitates more efficient and flexible contingency planning in the context of scientific missions.
- [358] arXiv:2402.16482 [ pdf , ps , other ]
-
Title: On Languaging a Simulation EngineSubjects: Artificial Intelligence (cs.AI) ; Computational Engineering, Finance, and Science (cs.CE); Computation and Language (cs.CL)
Abstract: Language model intelligence is revolutionizing the way we program materials simulations. However, the diversity of simulation scenarios renders it challenging to precisely transform human language into a tailored simulator. Here, using three functionalized types of language model, we propose a language-to-simulation (Lang2Sim) framework that enables interactive navigation on languaging a simulation engine, by taking a scenario instance of water sorption in porous matrices. Unlike line-by-line coding of a target simulator, the language models interpret each simulator as an assembly of invariant tool function and its variant input-output pair. Lang2Sim enables the precise transform of textual description by functionalizing and sequentializing the language models of, respectively, rationalizing the tool categorization, customizing its input-output combinations, and distilling the simulator input into executable format. Importantly, depending on its functionalized type, each language model features a distinct processing of chat history to best balance its memory limit and information completeness, thus leveraging the model intelligence to unstructured nature of human request. Overall, this work establishes language model as an intelligent platform to unlock the era of languaging a simulation engine.
- [359] arXiv:2402.16505 [ pdf , ps , html , other ]
-
Title: Memory GAPS: Would LLMs pass the Tulving Test?Comments: 15 pages, 3 figuresSubjects: Artificial Intelligence (cs.AI)
Abstract: The Tulving Test was designed to investigate memory performance in recognition and recall tasks. Its results help assess the relevance of the "Synergistic Ecphory Model" of memory and similar RK paradigms in human performance. This paper starts investigating whether the more than forty-year-old framework sheds some light on LLMs' acts of remembering.
- [360] arXiv:2402.16631 [ pdf , ps , html , other ]
-
Title: GenAINet: Enabling Wireless Collective Intelligence via Knowledge Transfer and ReasoningHang Zou , Qiyang Zhao , Lina Bariah , Yu Tian , Mehdi Bennis , Samson Lasaulce , Merouane Debbah , Faouzi BaderSubjects: Artificial Intelligence (cs.AI) ; Networking and Internet Architecture (cs.NI); Signal Processing (eess.SP)
Abstract: Generative artificial intelligence (GenAI) and communication networks are expected to have groundbreaking synergies in 6G. Connecting GenAI agents over a wireless network can potentially unleash the power of collective intelligence and pave the way for artificial general intelligence (AGI). However, current wireless networks are designed as a "data pipe" and are not suited to accommodate and leverage the power of GenAI. In this paper, we propose the GenAINet framework in which distributed GenAI agents communicate knowledge (high-level concepts or abstracts) to accomplish arbitrary tasks. We first provide a network architecture integrating GenAI capabilities to manage both network protocols and applications. Building on this, we investigate effective communication and reasoning problems by proposing a semantic-native GenAINet. Specifically, GenAI agents extract semantic concepts from multi-modal raw data, build a knowledgebase representing their semantic relations, which is retrieved by GenAI models for planning and reasoning. Under this paradigm, an agent can learn fast from other agents' experience for making better decisions with efficient communications. Furthermore, we conduct two case studies where in wireless device query, we show that extracting and transferring knowledge can improve query accuracy with reduced communication; and in wireless power control, we show that distributed agents can improve decisions via collaborative reasoning. Finally, we address that developing a hierarchical semantic level Telecom world model is a key path towards network of collective intelligence.
- [361] arXiv:2402.16651 [ pdf , ps , other ]
-
Title: A Comprehensive Survey of Belief Rule Base (BRB) Hybrid Expert system: Bridging Decision Science and Professional ServicesSubjects: Artificial Intelligence (cs.AI) ; Computers and Society (cs.CY)
Abstract: The Belief Rule Base (BRB) system that adopts a hybrid approach integrating the precision of expert systems with the adaptability of data-driven models. Characterized by its use of if-then rules to accommodate various types of uncertainty through belief degrees, BRB adeptly handles fuzziness, randomness, and ignorance. This semi-quantitative tool excels in processing both numerical data and linguistic knowledge from diverse sources, making it as an indispensable resource in modelling complex nonlinear systems. Notably, BRB's transparent, white-box nature ensures accessibility and clarity for decision-makers and stakeholders, further enhancing its applicability. With its growing adoption in fields ranging from decision-making and reliability evaluation in network security and fault diagnosis, this study aims to explore the evolution and the multifaceted applications of BRB. By analysing its development across different domains, we highlight BRB's potential to revolutionize sectors traditionally resistant to technological disruption, in particular insurance and law.
- [362] arXiv:2402.16654 [ pdf , ps , html , other ]
-
Title: GigaPevt: Multimodal Medical AssistantPavel Blinov , Konstantin Egorov , Ivan Sviridov , Nikolay Ivanov , Stepan Botman , Evgeniy Tagin , Stepan Kudin , Galina Zubkova , Andrey SavchenkoComments: 4 pages, 2 figures, 2 tablesSubjects: Artificial Intelligence (cs.AI) ; Computation and Language (cs.CL); Human-Computer Interaction (cs.HC)
Abstract: Building an intelligent and efficient medical assistant is still a challenging AI problem. The major limitation comes from the data modality scarceness, which reduces comprehensive patient perception. This demo paper presents the GigaPevt, the first multimodal medical assistant that combines the dialog capabilities of large language models with specialized medical models. Such an approach shows immediate advantages in dialog quality and metric performance, with a 1.18\% accuracy improvement in the question-answering task.
- [363] arXiv:2402.16751 [ pdf , ps , other ]
-
Title: Value Preferences Estimation and Disambiguation in Hybrid Participatory SystemsComments: Under reviewSubjects: Artificial Intelligence (cs.AI) ; Computation and Language (cs.CL)
Abstract: Understanding citizens' values in participatory systems is crucial for citizen-centric policy-making. We envision a hybrid participatory system where participants make choices and provide motivations for those choices, and AI agents estimate their value preferences by interacting with them. We focus on situations where a conflict is detected between participants' choices and motivations, and propose methods for estimating value preferences while addressing detected inconsistencies by interacting with the participants. We operationalize the philosophical stance that "valuing is deliberatively consequential." That is, if a participant's choice is based on a deliberation of value preferences, the value preferences can be observed in the motivation the participant provides for the choice. Thus, we propose and compare value estimation methods that prioritize the values estimated from motivations over the values estimated from choices alone. Then, we introduce a disambiguation strategy that addresses the detected inconsistencies between choices and motivations by directly interacting with the participants. We evaluate the proposed methods on a dataset of a large-scale survey on energy transition. The results show that explicitly addressing inconsistencies between choices and motivations improves the estimation of an individual's value preferences. The disambiguation strategy does not show substantial improvements when compared to similar baselines--however, we discuss how the novelty of the approach can open new research avenues and propose improvements to address the current limitations.
- [364] arXiv:2402.16823 [ pdf , ps , other ]
-
Title: Language Agents as Optimizable GraphsMingchen Zhuge , Wenyi Wang , Louis Kirsch , Francesco Faccio , Dmitrii Khizbullin , Jürgen SchmidhuberSubjects: Artificial Intelligence (cs.AI) ; Computation and Language (cs.CL); Machine Learning (cs.LG); Multiagent Systems (cs.MA)
Abstract: Various human-designed prompt engineering techniques have been proposed to improve problem solvers based on Large Language Models (LLMs), yielding many disparate code bases. We unify these approaches by describing LLM-based agents as computational graphs. The nodes implement functions to process multimodal data or query LLMs, and the edges describe the information flow between operations. Graphs can be recursively combined into larger composite graphs representing hierarchies of inter-agent collaboration (where edges connect operations of different agents). Our novel automatic graph optimizers (1) refine node-level LLM prompts (node optimization) and (2) improve agent orchestration by changing graph connectivity (edge optimization). Experiments demonstrate that our framework can be used to efficiently develop, integrate, and automatically improve various LLM agents. The code can be found at this https URL .
- [365] arXiv:2402.16878 [ pdf , ps , html , other ]
-
Title: EvoGPT-f: An Evolutionary GPT Framework for Benchmarking Formal Math LanguagesSubjects: Artificial Intelligence (cs.AI) ; Computation and Language (cs.CL); Machine Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)
Abstract: Formal mathematics is the discipline of translating mathematics into a programming language in which any statement can be unequivocally checked by a computer. Mathematicians and computer scientists have spent decades of painstaking formalization efforts developing languages such as Coq, HOL, and Lean. Machine learning research has converged on these formal math corpora and given rise to an assortment of methodologies to aid in interactive and automated theorem proving. However, these papers have primarily focused on one method, for one proof task, in one language. This paper introduces EvoGPT-f: a novel evolutionary framework for the first systematic quantitative analysis of the differential machine learnability of five formal math corpora (Lean 3, Lean 4, Coq, HOL 4, HOL Light) using four tokenization methods (character, word-level, Byte Pair Encoding and StarCoder tokenizer). This paper does not put to rest the question of the "best" or "easiest" language to learn. Rather, this framework and preliminary findings begin to illuminate the differential machine learnability of these languages, offering a foundation to forge more systematic quantitative and qualitative comparative research across communities.
- [366] arXiv:2402.16905 [ pdf , ps , html , other ]
-
Title: Enforcing Temporal Constraints on Generative Agent Behavior with Reactive SynthesisComments: 22 pagesSubjects: Artificial Intelligence (cs.AI) ; Machine Learning (cs.LG); Logic in Computer Science (cs.LO)
Abstract: The surge in popularity of Large Language Models (LLMs) has opened doors for new approaches to the creation of interactive agents. However, managing the temporal behavior of such agents over the course of an interaction remains challenging. The stateful, long-term horizon and quantitative reasoning required for coherent agent behavior does not fit well into the LLM paradigm. We propose a combination of formal logic-based program synthesis and LLM content generation to create generative agents that adhere to temporal constraints. Our approach uses Temporal Stream Logic (TSL) to generate an automaton that enforces a temporal structure on an agent and leaves the details of each action for a moment in time to an LLM. By using TSL, we are able to augment the generative agent where users have a higher level of guarantees on behavior, better interpretability of the system, and more ability to build agents in a modular way. We evaluate our approach on different tasks involved in creating a coherent interactive agent specialized for various application domains. We found that over all of the tasks, our approach using TSL achieves at least 96% adherence, whereas the pure LLM-based approach demonstrates as low as 14.67% adherence.
- [367] arXiv:2402.16924 [ pdf , ps , other ]
-
Title: Theoretical Unification of the Fractured Aspects of InformationComments: 52 pagesSubjects: Artificial Intelligence (cs.AI)
Abstract: The article has as its main objective the identification of fundamental epistemological obstacles in the study of information related to unnecessary methodological assumptions and the demystification of popular beliefs in the fundamental divisions of the aspects of information that can be understood as Bachelardian rupture of epistemological obstacles. These general considerations are preceded by an overview of the motivations for the study of information and the role of the concept of information in the conceptualization of intelligence, complexity, and consciousness justifying the need for a sufficiently general perspective in the study of information, and are followed at the end of the article by a brief exposition of an example of a possible application in the development of the unified theory of information free from unnecessary divisions and claims of superiority of the existing preferences in methodology. The reference to Gaston Bachelard and his ideas of epistemological obstacles and epistemological ruptures seems highly appropriate for the reflection on the development of information study, in particular in the context of obstacles such as the absence of semantics of information, negligence of its structural analysis, separation of its digital and analog forms, and misguided use of mathematics.
- [368] arXiv:2402.16973 [ pdf , ps , html , other ]
-
Title: Successfully Guiding Humans with Imperfect Instructions by Highlighting Potential Errors and Suggesting CorrectionsSubjects: Artificial Intelligence (cs.AI) ; Computation and Language (cs.CL); Human-Computer Interaction (cs.HC)
Abstract: This paper addresses the challenge of leveraging imperfect language models to guide human decision-making in the context of a grounded navigation task. We show that an imperfect instruction generation model can be complemented with an effective communication mechanism to become more successful at guiding humans. The communication mechanism we build comprises models that can detect potential hallucinations in instructions and suggest practical alternatives, and an intuitive interface to present that information to users. We show that this approach reduces the human navigation error by up to 29% with no additional cognitive burden. This result underscores the potential of integrating diverse communication channels into AI systems to compensate for their imperfections and enhance their utility for humans.
- [369] arXiv:2402.17032 [ pdf , ps , html , other ]
-
Title: REFACTOR: Learning to Extract Theorems from ProofsComments: ICLR 2024Subjects: Artificial Intelligence (cs.AI) ; Machine Learning (cs.LG)
Abstract: Human mathematicians are often good at recognizing modular and reusable theorems that make complex mathematical results within reach. In this paper, we propose a novel method called theoREm-from-prooF extrACTOR (REFACTOR) for training neural networks to mimic this ability in formal mathematical theorem proving. We show on a set of unseen proofs, REFACTOR is able to extract 19.6% of the theorems that humans would use to write the proofs. When applying the model to the existing Metamath library, REFACTOR extracted 16 new theorems. With newly extracted theorems, we show that the existing proofs in the MetaMath database can be refactored. The new theorems are used very frequently after refactoring, with an average usage of 733.5 times, and help shorten the proof lengths. Lastly, we demonstrate that the prover trained on the new-theorem refactored dataset proves more test theorems and outperforms state-of-the-art baselines by frequently leveraging a diverse set of newly extracted theorems. Code can be found at this https URL .
- [370] arXiv:2402.17161 [ pdf , ps , html , other ]
-
Title: Large Language Model for Participatory Urban PlanningComments: arXiv admin note: text overlap with arXiv:2402.01698Subjects: Artificial Intelligence (cs.AI) ; Multiagent Systems (cs.MA)
Abstract: Participatory urban planning is the mainstream of modern urban planning that involves the active engagement of residents. However, the traditional participatory paradigm requires experienced planning experts and is often time-consuming and costly. Fortunately, the emerging Large Language Models (LLMs) have shown considerable ability to simulate human-like agents, which can be used to emulate the participatory process easily. In this work, we introduce an LLM-based multi-agent collaboration framework for participatory urban planning, which can generate land-use plans for urban regions considering the diverse needs of residents. Specifically, we construct LLM agents to simulate a planner and thousands of residents with diverse profiles and backgrounds. We first ask the planner to carry out an initial land-use plan. To deal with the different facilities needs of residents, we initiate a discussion among the residents in each community about the plan, where residents provide feedback based on their profiles. Furthermore, to improve the efficiency of discussion, we adopt a fishbowl discussion mechanism, where part of the residents discuss and the rest of them act as listeners in each round. Finally, we let the planner modify the plan based on residents' feedback. We deploy our method on two real-world regions in Beijing. Experiments show that our method achieves state-of-the-art performance in residents satisfaction and inclusion metrics, and also outperforms human experts in terms of service accessibility and ecology metrics.
- [371] arXiv:2402.17168 [ pdf , ps , html , other ]
-
Title: Benchmarking Data Science AgentsComments: Source code and data are available at this https URLSubjects: Artificial Intelligence (cs.AI) ; Computation and Language (cs.CL)
Abstract: In the era of data-driven decision-making, the complexity of data analysis necessitates advanced expertise and tools of data science, presenting significant challenges even for specialists. Large Language Models (LLMs) have emerged as promising aids as data science agents, assisting humans in data analysis and processing. Yet their practical efficacy remains constrained by the varied demands of real-world applications and complicated analytical process. In this paper, we introduce DSEval -- a novel evaluation paradigm, as well as a series of innovative benchmarks tailored for assessing the performance of these agents throughout the entire data science lifecycle. Incorporating a novel bootstrapped annotation method, we streamline dataset preparation, improve the evaluation coverage, and expand benchmarking comprehensiveness. Our findings uncover prevalent obstacles and provide critical insights to inform future advancements in the field.
- [372] arXiv:2402.17270 [ pdf , ps , html , other ]
-
Title: Multi-Agent, Human-Agent and Beyond: A Survey on Cooperation in Social DilemmasSubjects: Artificial Intelligence (cs.AI) ; Computer Science and Game Theory (cs.GT); Human-Computer Interaction (cs.HC); Machine Learning (cs.LG); Multiagent Systems (cs.MA)
Abstract: The study of cooperation within social dilemmas has long been a fundamental topic across various disciplines, including computer science and social science. Recent advancements in Artificial Intelligence (AI) have significantly reshaped this field, offering fresh insights into understanding and enhancing cooperation. This survey examines three key areas at the intersection of AI and cooperation in social dilemmas. First, focusing on multi-agent cooperation, we review the intrinsic and external motivations that support cooperation among rational agents, and the methods employed to develop effective strategies against diverse opponents. Second, looking into human-agent cooperation, we discuss the current AI algorithms for cooperating with humans and the human biases towards AI agents. Third, we review the emergent field of leveraging AI agents to enhance cooperation among humans. We conclude by discussing future research avenues, such as using large language models, establishing unified theoretical frameworks, revisiting existing theories of human cooperation, and exploring multiple real-world applications.
- [373] arXiv:2402.17385 [ pdf , ps , html , other ]
-
Title: Determinants of LLM-assisted Decision-MakingSubjects: Artificial Intelligence (cs.AI) ; Human-Computer Interaction (cs.HC)
Abstract: Decision-making is a fundamental capability in everyday life. Large Language Models (LLMs) provide multifaceted support in enhancing human decision-making processes. However, understanding the influencing factors of LLM-assisted decision-making is crucial for enabling individuals to utilize LLM-provided advantages and minimize associated risks in order to make more informed and better decisions. This study presents the results of a comprehensive literature analysis, providing a structural overview and detailed analysis of determinants impacting decision-making with LLM support. In particular, we explore the effects of technological aspects of LLMs, including transparency and prompt engineering, psychological factors such as emotions and decision-making styles, as well as decision-specific determinants such as task difficulty and accountability. In addition, the impact of the determinants on the decision-making process is illustrated via multiple application scenarios. Drawing from our analysis, we develop a dependency framework that systematizes possible interactions in terms of reciprocal interdependencies between these determinants. Our research reveals that, due to the multifaceted interactions with various determinants, factors such as trust in or reliance on LLMs, the user's mental model, and the characteristics of information processing are identified as significant aspects influencing LLM-assisted decision-making processes. Our findings can be seen as crucial for improving decision quality in human-AI collaboration, empowering both users and organizations, and designing more effective LLM interfaces. Additionally, our work provides a foundation for future empirical investigations on the determinants of decision-making assisted by LLMs.
- [374] arXiv:2402.17431 [ pdf , ps , html , other ]
-
Title: The KANDY Benchmark: Incremental Neuro-Symbolic Learning and Reasoning with Kandinsky PatternsSubjects: Artificial Intelligence (cs.AI) ; Machine Learning (cs.LG)
Abstract: Artificial intelligence is continuously seeking novel challenges and benchmarks to effectively measure performance and to advance the state-of-the-art. In this paper we introduce KANDY, a benchmarking framework that can be used to generate a variety of learning and reasoning tasks inspired by Kandinsky patterns. By creating curricula of binary classification tasks with increasing complexity and with sparse supervisions, KANDY can be used to implement benchmarks for continual and semi-supervised learning, with a specific focus on symbol compositionality. Classification rules are also provided in the ground truth to enable analysis of interpretable solutions. Together with the benchmark generation pipeline, we release two curricula, an easier and a harder one, that we propose as new challenges for the research community. With a thorough experimental evaluation, we show how both state-of-the-art neural models and purely symbolic approaches struggle with solving most of the tasks, thus calling for the application of advanced neuro-symbolic methods trained over time.
- [375] arXiv:2402.17546 [ pdf , ps , html , other ]
-
Title: COCOA: CBT-based Conversational Counseling Agent using Memory Specialized in Cognitive Distortions and Dynamic PromptComments: 4 pages, 2 figuresSubjects: Artificial Intelligence (cs.AI) ; Computation and Language (cs.CL)
Abstract: The demand for conversational agents that provide mental health care is consistently increasing. In this work, we develop a psychological counseling agent, referred to as CoCoA, that applies Cognitive Behavioral Therapy (CBT) techniques to identify and address cognitive distortions inherent in the client's statements. Specifically, we construct a memory system to efficiently manage information necessary for counseling while extracting high-level insights about the client from their utterances. Additionally, to ensure that the counseling agent generates appropriate responses, we introduce dynamic prompting to flexibly apply CBT techniques and facilitate the appropriate retrieval of information. We conducted dialogues between CoCoA and characters from this http URL , creating a dataset for evaluation. Then, we asked GPT to evaluate the constructed counseling dataset, and our model demonstrated a statistically significant difference from other models.
- [376] arXiv:2402.17553 [ pdf , ps , html , other ]
-
Title: OmniACT: A Dataset and Benchmark for Enabling Multimodal Generalist Autonomous Agents for Desktop and WebRaghav Kapoor , Yash Parag Butala , Melisa Russak , Jing Yu Koh , Kiran Kamble , Waseem Alshikh , Ruslan SalakhutdinovSubjects: Artificial Intelligence (cs.AI) ; Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV); Human-Computer Interaction (cs.HC)
Abstract: For decades, human-computer interaction has fundamentally been manual. Even today, almost all productive work done on the computer necessitates human input at every step. Autonomous virtual agents represent an exciting step in automating many of these menial tasks. Virtual agents would empower users with limited technical proficiency to harness the full possibilities of computer systems. They could also enable the efficient streamlining of numerous computer tasks, ranging from calendar management to complex travel bookings, with minimal human intervention. In this paper, we introduce OmniACT, the first-of-a-kind dataset and benchmark for assessing an agent's capability to generate executable programs to accomplish computer tasks. Our scope extends beyond traditional web automation, covering a diverse range of desktop applications. The dataset consists of fundamental tasks such as "Play the next song", as well as longer horizon tasks such as "Send an email to John Doe mentioning the time and place to meet". Specifically, given a pair of screen image and a visually-grounded natural language task, the goal is to generate a script capable of fully executing the task. We run several strong baseline language model agents on our benchmark. The strongest baseline, GPT-4, performs the best on our benchmark However, its performance level still reaches only 15% of the human proficiency in generating executable scripts capable of completing the task, demonstrating the challenge of our task for conventional web agents. Our benchmark provides a platform to measure and evaluate the progress of language model agents in automating computer tasks and motivates future work towards building multimodal models that bridge large language models and the visual grounding of computer screens.
- [377] arXiv:2402.17574 [ pdf , ps , html , other ]
-
Title: Agent-Pro: Learning to Evolve via Policy-Level Reflection and OptimizationWenqi Zhang , Ke Tang , Hai Wu , Mengna Wang , Yongliang Shen , Guiyang Hou , Zeqi Tan , Peng Li , Yueting Zhuang , Weiming LuComments: LLM-based AgentSubjects: Artificial Intelligence (cs.AI) ; Computation and Language (cs.CL)
Abstract: Large Language Models exhibit robust problem-solving capabilities for diverse tasks. However, most LLM-based agents are designed as specific task solvers with sophisticated prompt engineering, rather than agents capable of learning and evolving through interactions. These task solvers necessitate manually crafted prompts to inform task rules and regulate LLM behaviors, inherently incapacitating to address complex dynamic scenarios e.g., large interactive games. In light of this, we propose Agent-Pro: an LLM-based Agent with Policy-level Reflection and Optimization that can learn a wealth of expertise from interactive experiences and progressively elevate its behavioral policy. Specifically, it involves a dynamic belief generation and reflection process for policy evolution. Rather than action-level reflection, Agent-Pro iteratively reflects on past trajectories and beliefs, fine-tuning its irrational beliefs for a better policy. Moreover, a depth-first search is employed for policy optimization, ensuring continual enhancement in policy payoffs. Agent-Pro is evaluated across two games: Blackjack and Texas Hold'em, outperforming vanilla LLM and specialized models. Our results show Agent-Pro can learn and evolve in complex and dynamic scenes, which also benefits numerous LLM-based applications.
- [378] arXiv:2402.17709 [ pdf , ps , html , other ]
-
Title: Case-Based or Rule-Based: How Do Transformers Do the Math?Subjects: Artificial Intelligence (cs.AI) ; Computation and Language (cs.CL)
Abstract: Despite the impressive performance in a variety of complex tasks, modern large language models (LLMs) still have trouble dealing with some math problems that are simple and intuitive for humans, such as addition. While we can easily learn basic rules of addition and apply them to new problems of any length, LLMs struggle to do the same. Instead, they may rely on similar "cases" seen in the training corpus for help. We define these two different reasoning mechanisms as "rule-based reasoning" and "case-based reasoning". Since rule-based reasoning is essential for acquiring the systematic generalization ability, we aim to explore exactly whether transformers use rule-based or case-based reasoning for math problems. Through carefully designed intervention experiments on five math tasks, we confirm that transformers are performing case-based reasoning, no matter whether scratchpad is used, which aligns with the previous observations that transformers use subgraph matching/shortcut learning to reason. To mitigate such problems, we propose a Rule-Following Fine-Tuning (RFFT) technique to teach transformers to perform rule-based reasoning. Specifically, we provide explicit rules in the input and then instruct transformers to recite and follow the rules step by step. Through RFFT, we successfully enable LLMs fine-tuned on 1-5 digit addition to generalize to up to 12-digit addition with over 95% accuracy, which is over 40% higher than scratchpad. The significant improvement demonstrates that teaching LLMs to explicitly use rules helps them learn rule-based reasoning and generalize better in length.
- [379] arXiv:2402.17739 [ pdf , ps , html , other ]
-
Title: reBandit: Random Effects based Online RL algorithm for Reducing Cannabis UseSusobhan Ghosh , Yongyi Guo , Pei-Yao Hung , Lara Coughlin , Erin Bonar , Inbal Nahum-Shani , Maureen Walton , Susan MurphySubjects: Artificial Intelligence (cs.AI) ; Machine Learning (cs.LG)
Abstract: The escalating prevalence of cannabis use, and associated cannabis-use disorder (CUD), poses a significant public health challenge globally. With a notably wide treatment gap, especially among emerging adults (EAs; ages 18-25), addressing cannabis use and CUD remains a pivotal objective within the 2030 United Nations Agenda for Sustainable Development Goals (SDG). In this work, we develop an online reinforcement learning (RL) algorithm called reBandit which will be utilized in a mobile health study to deliver personalized mobile health interventions aimed at reducing cannabis use among EAs. reBandit utilizes random effects and informative Bayesian priors to learn quickly and efficiently in noisy mobile health environments. Moreover, reBandit employs Empirical Bayes and optimization techniques to autonomously update its hyper-parameters online. To evaluate the performance of our algorithm, we construct a simulation testbed using data from a prior study, and compare against commonly used algorithms in mobile health studies. We show that reBandit performs equally well or better than all the baseline algorithms, and the performance gap widens as population heterogeneity increases in the simulation environment, proving its adeptness to adapt to diverse population of study participants.
- [380] arXiv:2402.17786 [ pdf , ps , html , other ]
-
Title: Stepwise Self-Consistent Mathematical Reasoning with Large Language ModelsSubjects: Artificial Intelligence (cs.AI) ; Computation and Language (cs.CL); Machine Learning (cs.LG)
Abstract: Using Large Language Models for complex mathematical reasoning is difficult, primarily due to the complexity of multi-step reasoning. The main challenges of this process include (1) selecting critical intermediate results to advance the procedure, and (2) limited exploration of potential solutions. To address these issues, we introduce a novel algorithm, namely Stepwise Self-Consistent Chain-of-Thought (SSC-CoT). SSC-CoT employs a strategy of selecting intermediate steps based on the intersection of various reasoning chains. Additionally, SSC-CoT enables the model to discover critical intermediate steps by querying a knowledge graph comprising relevant domain knowledge. To validate SSC-CoT, we present a new dataset, TriMaster100, tailored for complex trigonometry problems. This dataset contains 100 questions, with each solution broken down into scored intermediate steps, facilitating a comprehensive evaluation of the mathematical reasoning process. On TriMaster100, SSC-CoT triples the effectiveness of the state-of-the-art methods. Furthermore, we benchmark SSC-CoT on the widely recognized complex mathematical question dataset, MATH level 5, and it surpasses the second-best method by 7.2% in accuracy. Code and the TriMaster100 dataset can be found at: this https URL .
- [381] arXiv:2402.17791 [ pdf , ps , html , other ]
-
Title: Label Informed Contrastive Pretraining for Node Importance Estimation on Knowledge GraphsComments: Accepted by IEEE TNNLSSubjects: Artificial Intelligence (cs.AI) ; Machine Learning (cs.LG); Social and Information Networks (cs.SI)
Abstract: Node Importance Estimation (NIE) is a task of inferring importance scores of the nodes in a graph. Due to the availability of richer data and knowledge, recent research interests of NIE have been dedicating to knowledge graphs for predicting future or missing node importance scores. Existing state-of-the-art NIE methods train the model by available labels, and they consider every interested node equally before training. However, the nodes with higher importance often require or receive more attention in real-world scenarios, e.g., people may care more about the movies or webpages with higher importance. To this end, we introduce Label Informed ContrAstive Pretraining (LICAP) to the NIE problem for being better aware of the nodes with high importance scores. Specifically, LICAP is a novel type of contrastive learning framework that aims to fully utilize the continuous labels to generate contrastive samples for pretraining embeddings. Considering the NIE problem, LICAP adopts a novel sampling strategy called top nodes preferred hierarchical sampling to first group all interested nodes into a top bin and a non-top bin based on node importance scores, and then divide the nodes within top bin into several finer bins also based on the scores. The contrastive samples are generated from those bins, and are then used to pretrain node embeddings of knowledge graphs via a newly proposed Predicate-aware Graph Attention Networks (PreGAT), so as to better separate the top nodes from non-top nodes, and distinguish the top nodes within top bin by keeping the relative order among finer bins. Extensive experiments demonstrate that the LICAP pretrained embeddings can further boost the performance of existing NIE methods and achieve the new state-of-the-art performance regarding both regression and ranking metrics. The source code for reproducibility is available at this https URL
- [382] arXiv:2402.17793 [ pdf , ps , html , other ]
-
Title: A Surprising Failure? Multimodal LLMs and the NLVR ChallengeSubjects: Artificial Intelligence (cs.AI) ; Computation and Language (cs.CL); Machine Learning (cs.LG)
Abstract: This study evaluates three state-of-the-art MLLMs -- GPT-4V, Gemini Pro, and the open-source model IDEFICS -- on the compositional natural language vision reasoning task NLVR. Given a human-written sentence paired with a synthetic image, this task requires the model to determine the truth value of the sentence with respect to the image. Despite the strong performance demonstrated by these models, we observe they perform poorly on NLVR, which was constructed to require compositional and spatial reasoning, and to be robust for semantic and systematic biases.
- [383] arXiv:2402.17930 [ pdf , ps , html , other ]
-
Title: Pragmatic Instruction Following and Goal Assistance via Cooperative Language-Guided Inverse PlanningComments: Accepted to AAMAS 2024. 8 pages (excl. references), 5 figures/tables. (Appendix: 8 pages, 8 figures/tables). Code available at: this https URLSubjects: Artificial Intelligence (cs.AI) ; Computation and Language (cs.CL); Machine Learning (cs.LG)
Abstract: People often give instructions whose meaning is ambiguous without further context, expecting that their actions or goals will disambiguate their intentions. How can we build assistive agents that follow such instructions in a flexible, context-sensitive manner? This paper introduces cooperative language-guided inverse plan search (CLIPS), a Bayesian agent architecture for pragmatic instruction following and goal assistance. Our agent assists a human by modeling them as a cooperative planner who communicates joint plans to the assistant, then performs multimodal Bayesian inference over the human's goal from actions and language, using large language models (LLMs) to evaluate the likelihood of an instruction given a hypothesized plan. Given this posterior, our assistant acts to minimize expected goal achievement cost, enabling it to pragmatically follow ambiguous instructions and provide effective assistance even when uncertain about the goal. We evaluate these capabilities in two cooperative planning domains (Doors, Keys & Gems and VirtualHome), finding that CLIPS significantly outperforms GPT-4V, LLM-based literal instruction following and unimodal inverse planning in both accuracy and helpfulness, while closely matching the inferences and assistive judgments provided by human raters.
- [384] arXiv:2402.17975 [ pdf , ps , html , other ]
-
Title: Sample-Efficient Preference-based Reinforcement Learning with Dynamics Aware RewardsComments: CoRL 2023. arXiv admin note: substantial text overlap with arXiv:2211.06527Subjects: Artificial Intelligence (cs.AI) ; Machine Learning (cs.LG)
Abstract: Preference-based reinforcement learning (PbRL) aligns a robot behavior with human preferences via a reward function learned from binary feedback over agent behaviors. We show that dynamics-aware reward functions improve the sample efficiency of PbRL by an order of magnitude. In our experiments we iterate between: (1) learning a dynamics-aware state-action representation (z^{sa}) via a self-supervised temporal consistency task, and (2) bootstrapping the preference-based reward function from (z^{sa}), which results in faster policy learning and better final policy performance. For example, on quadruped-walk, walker-walk, and cheetah-run, with 50 preference labels we achieve the same performance as existing approaches with 500 preference labels, and we recover 83\% and 66\% of ground truth reward policy performance versus only 38\% and 21\%. The performance gains demonstrate the benefits of explicitly learning a dynamics-aware reward model. Repo: \texttt{ this https URL }.
- [385] arXiv:2402.18023 [ pdf , ps , html , other ]
-
Title: Do Large Language Models Mirror Cognitive Language Processing?Subjects: Artificial Intelligence (cs.AI) ; Computation and Language (cs.CL)
Abstract: Large language models (LLMs) have demonstrated remarkable capabilities in text comprehension and logical reasoning, achiving or even surpassing human-level performance in numerous cognition tasks. As LLMs are trained from massive textual outputs of human language cognition, it is natural to ask whether LLMs mirror cognitive language processing. Or to what extend LLMs resemble cognitive language processing? In this paper, we propose a novel method that bridge between LLM representations and human cognition signals to evaluate how effectively LLMs simulate cognitive language processing. We employ Representational Similarity Analysis (RSA) to mearsure the alignment between 16 mainstream LLMs and fMRI signals of the brain. We empirically investigate the impact of a variety of factors (e.g., model scaling, alignment training, instruction appending) on such LLM-brain alignment. Experimental results indicate that model scaling is positively correlated with LLM-brain similarity, and alignment training can significantly improve LLM-brain similarity. Additionally, the performance of a wide range of LLM evaluations (e.g., MMLU, Chatbot Arena) is highly correlated with the LLM-brain similarity.
- [386] arXiv:2402.18040 [ pdf , ps , html , other ]
-
Title: Automated Discovery of Integral with Deep LearningSubjects: Artificial Intelligence (cs.AI) ; Machine Learning (cs.LG)
Abstract: Recent advancements in the realm of deep learning, particularly in the development of large language models (LLMs), have demonstrated AI's ability to tackle complex mathematical problems or solving programming challenges. However, the capability to solve well-defined problems based on extensive training data differs significantly from the nuanced process of making scientific discoveries. Trained on almost all human knowledge available, today's sophisticated LLMs basically learn to predict sequences of tokens. They generate mathematical derivations and write code in a similar way as writing an essay, and do not have the ability to pioneer scientific discoveries in the manner a human scientist would do.
In this study we delve into the potential of using deep learning to rediscover a fundamental mathematical concept: integrals. By defining integrals as area under the curve, we illustrate how AI can deduce the integral of a given function, exemplified by inferring $\int_{0}^{x} t^2 dt = \frac{x^3}{3}$ and $\int_{0}^{x} ae^{bt} dt = \frac{a}{b} e^{bx} - \frac{a}{b}$. Our experiments show that deep learning models can approach the task of inferring integrals either through a sequence-to-sequence model, akin to language translation, or by uncovering the rudimentary principles of integration, such as $\int_{0}^{x} t^n dt = \frac{x^{n+1}}{n+1}$. - [387] arXiv:2402.18144 [ pdf , ps , other ]
-
Title: Random Silicon Sampling: Simulating Human Sub-Population Opinion Using a Large Language Model Based on Group-Level Demographic InformationSeungjong Sun , Eungu Lee , Dongyan Nan , Xiangying Zhao , Wonbyung Lee , Bernard J. Jansen , Jang Hyun KimComments: 25 pages, 4 figures, 19 TablesSubjects: Artificial Intelligence (cs.AI) ; Computers and Society (cs.CY)
Abstract: Large language models exhibit societal biases associated with demographic information, including race, gender, and others. Endowing such language models with personalities based on demographic data can enable generating opinions that align with those of humans. Building on this idea, we propose "random silicon sampling," a method to emulate the opinions of the human population sub-group. Our study analyzed 1) a language model that generates the survey responses that correspond with a human group based solely on its demographic distribution and 2) the applicability of our methodology across various demographic subgroups and thematic questions. Through random silicon sampling and using only group-level demographic information, we discovered that language models can generate response distributions that are remarkably similar to the actual U.S. public opinion polls. Moreover, we found that the replicability of language models varies depending on the demographic group and topic of the question, and this can be attributed to inherent societal biases in the models. Our findings demonstrate the feasibility of mirroring a group's opinion using only demographic distribution and elucidate the effect of social biases in language models on such simulations.
- [388] arXiv:2402.18157 [ pdf , ps , html , other ]
-
Title: From Summary to Action: Enhancing Large Language Models for Complex Tasks with Open World APIsYulong Liu , Yunlong Yuan , Chunwei Wang , Jianhua Han , Yongqiang Ma , Li Zhang , Nanning Zheng , Hang XuSubjects: Artificial Intelligence (cs.AI) ; Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV)
Abstract: The distinction between humans and animals lies in the unique ability of humans to use and create tools. Tools empower humans to overcome physiological limitations, fostering the creation of magnificent civilizations. Similarly, enabling foundational models like Large Language Models (LLMs) with the capacity to learn external tool usage may serve as a pivotal step toward realizing artificial general intelligence. Previous studies in this field have predominantly pursued two distinct approaches to augment the tool invocation capabilities of LLMs. The first approach emphasizes the construction of relevant datasets for model fine-tuning. The second approach, in contrast, aims to fully exploit the inherent reasoning abilities of LLMs through in-context learning strategies. In this work, we introduce a novel tool invocation pipeline designed to control massive real-world APIs. This pipeline mirrors the human task-solving process, addressing complicated real-life user queries. At each step, we guide LLMs to summarize the achieved results and determine the next course of action. We term this pipeline `from Summary to action', Sum2Act for short. Empirical evaluations of our Sum2Act pipeline on the ToolBench benchmark show significant performance improvements, outperforming established methods like ReAct and DFSDT. This highlights Sum2Act's effectiveness in enhancing LLMs for complex real-world tasks.
- [389] arXiv:2402.18381 [ pdf , ps , html , other ]
-
Title: Large Language Models As Evolution StrategiesComments: 11 pages, 14 figuresSubjects: Artificial Intelligence (cs.AI) ; Machine Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)
Abstract: Large Transformer models are capable of implementing a plethora of so-called in-context learning algorithms. These include gradient descent, classification, sequence completion, transformation, and improvement. In this work, we investigate whether large language models (LLMs), which never explicitly encountered the task of black-box optimization, are in principle capable of implementing evolutionary optimization algorithms. While previous works have solely focused on language-based task specification, we move forward and focus on the zero-shot application of LLMs to black-box optimization. We introduce a novel prompting strategy, consisting of least-to-most sorting of discretized population members and querying the LLM to propose an improvement to the mean statistic, i.e. perform a type of black-box recombination operation. Empirically, we find that our setup allows the user to obtain an LLM-based evolution strategy, which we call `EvoLLM', that robustly outperforms baseline algorithms such as random search and Gaussian Hill Climbing on synthetic BBOB functions as well as small neuroevolution tasks. Hence, LLMs can act as `plug-in' in-context recombination operators. We provide several comparative studies of the LLM's model size, prompt strategy, and context construction. Finally, we show that one can flexibly improve EvoLLM's performance by providing teacher algorithm information via instruction fine-tuning on previously collected teacher optimization trajectories.
- [390] arXiv:2402.18393 [ pdf , ps , html , other ]
-
Title: Evaluating Decision Optimality of Autonomous Driving via Metamorphic TestingSubjects: Artificial Intelligence (cs.AI) ; Neural and Evolutionary Computing (cs.NE); Robotics (cs.RO); Software Engineering (cs.SE)
Abstract: Autonomous Driving System (ADS) testing is crucial in ADS development, with the current primary focus being on safety. However, the evaluation of non-safety-critical performance, particularly the ADS's ability to make optimal decisions and produce optimal paths for autonomous vehicles (AVs), is equally vital to ensure the intelligence and reduce risks of AVs. Currently, there is little work dedicated to assessing ADSs' optimal decision-making performance due to the lack of corresponding oracles and the difficulty in generating scenarios with non-optimal decisions. In this paper, we focus on evaluating the decision-making quality of an ADS and propose the first method for detecting non-optimal decision scenarios (NoDSs), where the ADS does not compute optimal paths for AVs. Firstly, to deal with the oracle problem, we propose a novel metamorphic relation (MR) aimed at exposing violations of optimal decisions. The MR identifies the property that the ADS should retain optimal decisions when the optimal path remains unaffected by non-invasive changes. Subsequently, we develop a new framework, Decictor, designed to generate NoDSs efficiently. Decictor comprises three main components: Non-invasive Mutation, MR Check, and Feedback. The Non-invasive Mutation ensures that the original optimal path in the mutated scenarios is not affected, while the MR Check is responsible for determining whether non-optimal decisions are made. To enhance the effectiveness of identifying NoDSs, we design a feedback metric that combines both spatial and temporal aspects of the AV's movement. We evaluate Decictor on Baidu Apollo, an open-source and production-grade ADS. The experimental results validate the effectiveness of Decictor in detecting non-optimal decisions of ADSs. Our work provides valuable and original insights into evaluating the non-safety-critical performance of ADSs.
- [391] arXiv:2402.18409 [ pdf , ps , html , other ]
-
Title: A Cognitive Evaluation Benchmark of Image Reasoning and Description for Large Vision Language ModelsSubjects: Artificial Intelligence (cs.AI) ; Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV)
Abstract: Large Vision Language Models (LVLMs), despite their recent success, are hardly comprehensively tested for their cognitive abilities. Inspired by the prevalent use of the "Cookie Theft" task in human cognition test, we propose a novel evaluation benchmark to evaluate high-level cognitive ability of LVLMs using images with rich semantics. It defines eight reasoning capabilities and consists of an image description task and a visual question answering task. Our evaluation on well-known LVLMs shows that there is still a large gap in cognitive ability between LVLMs and humans.
- [392] arXiv:2402.18426 [ pdf , ps , html , other ]
-
Title: A Relational Inductive Bias for Dimensional Abstraction in Neural NetworksSubjects: Artificial Intelligence (cs.AI) ; Machine Learning (cs.LG)
Abstract: The human cognitive system exhibits remarkable flexibility and generalization capabilities, partly due to its ability to form low-dimensional, compositional representations of the environment. In contrast, standard neural network architectures often struggle with abstract reasoning tasks, overfitting, and requiring extensive data for training. This paper investigates the impact of the relational bottleneck -- a mechanism that focuses processing on relations among inputs -- on the learning of factorized representations conducive to compositional coding and the attendant flexibility of processing. We demonstrate that such a bottleneck not only improves generalization and learning efficiency, but also aligns network performance with human-like behavioral biases. Networks trained with the relational bottleneck developed orthogonal representations of feature dimensions latent in the dataset, reflecting the factorized structure thought to underlie human cognitive flexibility. Moreover, the relational network mimics human biases towards regularity without pre-specified symbolic primitives, suggesting that the bottleneck fosters the emergence of abstract representations that confer flexibility akin to symbols.
- [393] arXiv:2402.18496 [ pdf , ps , html , other ]
-
Title: Language Models Represent Beliefs of Self and OthersComments: project page: this https URLSubjects: Artificial Intelligence (cs.AI) ; Computation and Language (cs.CL)
Abstract: Understanding and attributing mental states, known as Theory of Mind (ToM), emerges as a fundamental capability for human social reasoning. While Large Language Models (LLMs) appear to possess certain ToM abilities, the mechanisms underlying these capabilities remain elusive. In this study, we discover that it is possible to linearly decode the belief status from the perspectives of various agents through neural activations of language models, indicating the existence of internal representations of self and others' beliefs. By manipulating these representations, we observe dramatic changes in the models' ToM performance, underscoring their pivotal role in the social reasoning process. Additionally, our findings extend to diverse social reasoning tasks that involve different causal inference patterns, suggesting the potential generalizability of these representations.
- [394] arXiv:2402.18679 [ pdf , ps , other ]
-
Title: Data Interpreter: An LLM Agent For Data ScienceSirui Hong , Yizhang Lin , Bang Liu , Bangbang Liu , Binhao Wu , Danyang Li , Jiaqi Chen , Jiayi Zhang , Jinlin Wang , Li Zhang , Lingyao Zhang , Min Yang , Mingchen Zhuge , Taicheng Guo , Tuo Zhou , Wei Tao , Wenyi Wang , Xiangru Tang , Xiangtao Lu , Xiawu Zheng , Xinbing Liang , Yaying Fei , Yuheng Cheng , Zongze Xu , Chenglin WuSubjects: Artificial Intelligence (cs.AI) ; Machine Learning (cs.LG)
Abstract: Large Language Model (LLM)-based agents have demonstrated remarkable effectiveness. However, their performance can be compromised in data science scenarios that require real-time data adjustment, expertise in optimization due to complex dependencies among various tasks, and the ability to identify logical errors for precise reasoning. In this study, we introduce the Data Interpreter, a solution designed to solve with code that emphasizes three pivotal techniques to augment problem-solving in data science: 1) dynamic planning with hierarchical graph structures for real-time data adaptability;2) tool integration dynamically to enhance code proficiency during execution, enriching the requisite expertise;3) logical inconsistency identification in feedback, and efficiency enhancement through experience recording. We evaluate the Data Interpreter on various data science and real-world tasks. Compared to open-source baselines, it demonstrated superior performance, exhibiting significant improvements in machine learning tasks, increasing from 0.86 to 0.95. Additionally, it showed a 26% increase in the MATH dataset and a remarkable 112% improvement in open-ended tasks. The solution will be released at this https URL .
- [395] arXiv:2402.18715 [ pdf , ps , html , other ]
-
Title: Commonsense Ontology MicropatternsSubjects: Artificial Intelligence (cs.AI) ; Logic in Computer Science (cs.LO)
Abstract: The previously introduced Modular Ontology Modeling methodology (MOMo) attempts to mimic the human analogical process by using modular patterns to assemble more complex concepts. To support this, MOMo organizes organizes ontology design patterns into design libraries, which are programmatically queryable, to support accelerated ontology development, for both human and automated processes. However, a major bottleneck to large-scale deployment of MOMo is the (to-date) limited availability of ready-to-use ontology design patterns. At the same time, Large Language Models have quickly become a source of common knowledge and, in some cases, replacing search engines for questions. In this paper, we thus present a collection of 104 ontology design patterns representing often occurring nouns, curated from the common-sense knowledge available in LLMs, organized into a fully-annotated modular ontology design library ready for use with MOMo.
- [396] arXiv:2402.18732 [ pdf , ps , html , other ]
-
Title: GAIA: Categorical Foundations of Generative AIComments: 65 pages. arXiv admin note: text overlap with arXiv:2212.08981Subjects: Artificial Intelligence (cs.AI) ; Machine Learning (cs.LG)
Abstract: In this paper, we propose GAIA, a generative AI architecture based on category theory. GAIA is based on a hierarchical model where modules are organized as a simplicial complex. Each simplicial complex updates its internal parameters biased on information it receives from its superior simplices and in turn relays updates to its subordinate sub-simplices. Parameter updates are formulated in terms of lifting diagrams over simplicial sets, where inner and outer horn extensions correspond to different types of learning problems. Backpropagation is modeled as an endofunctor over the category of parameters, leading to a coalgebraic formulation of deep learning.
- [397] arXiv:2402.18743 [ pdf , ps , html , other ]
-
Title: A revision on Multi-Criteria Decision Making methods for Multi-UAV Mission Planning SupportComments: Preprint submitted and acepted in Expert Systems with ApplicationsJournal-ref: Expert Systems with Applications, Volume 160, 2020, 113708Subjects: Artificial Intelligence (cs.AI) ; Robotics (cs.RO)
Abstract: Over the last decade, Unmanned Aerial Vehicles (UAVs) have been extensively used in many commercial applications due to their manageability and risk avoidance. One of the main problems considered is the Mission Planning for multiple UAVs, where a solution plan must be found satisfying the different constraints of the problem. This problem has multiple variables that must be optimized simultaneously, such as the makespan, the cost of the mission or the risk. Therefore, the problem has a lot of possible optimal solutions, and the operator must select the final solution to be executed among them. In order to reduce the workload of the operator in this decision process, a Decision Support System (DSS) becomes necessary. In this work, a DSS consisting of ranking and filtering systems, which order and reduce the optimal solutions, has been designed. With regard to the ranking system, a wide range of Multi-Criteria Decision Making (MCDM) methods, including some fuzzy MCDM, are compared on a multi-UAV mission planning scenario, in order to study which method could fit better in a multi-UAV decision support system. Expert operators have evaluated the solutions returned, and the results show, on the one hand, that fuzzy methods generally achieve better average scores, and on the other, that all of the tested methods perform better when the preferences of the operators are biased towards a specific variable, and worse when their preferences are balanced. For the filtering system, a similarity function based on the proximity of the solutions has been designed, and on top of that, a threshold is tuned empirically to decide how to filter solutions without losing much of the hypervolume of the space of solutions.
- [398] arXiv:2402.18784 [ pdf , ps , html , other ]
-
Title: Brain-inspired and Self-based Artificial IntelligenceYi Zeng , Feifei Zhao , Yuxuan Zhao , Dongcheng Zhao , Enmeng Lu , Qian Zhang , Yuwei Wang , Hui Feng , Zhuoya Zhao , Jihang Wang , Qingqun Kong , Yinqian Sun , Yang Li , Guobin Shen , Bing Han , Yiting Dong , Wenxuan Pan , Xiang He , Aorigele Bao , Jin WangSubjects: Artificial Intelligence (cs.AI) ; Neurons and Cognition (q-bio.NC)
Abstract: The question "Can machines think?" and the Turing Test to assess whether machines could achieve human-level intelligence is one of the roots of AI. With the philosophical argument "I think, therefore I am", this paper challenge the idea of a "thinking machine" supported by current AIs since there is no sense of self in them. Current artificial intelligence is only seemingly intelligent information processing and does not truly understand or be subjectively aware of oneself and perceive the world with the self as human intelligence does. In this paper, we introduce a Brain-inspired and Self-based Artificial Intelligence (BriSe AI) paradigm. This BriSe AI paradigm is dedicated to coordinating various cognitive functions and learning strategies in a self-organized manner to build human-level AI models and robotic applications. Specifically, BriSe AI emphasizes the crucial role of the Self in shaping the future AI, rooted with a practical hierarchical Self framework, including Perception and Learning, Bodily Self, Autonomous Self, Social Self, and Conceptual Self. The hierarchical framework of the Self highlights self-based environment perception, self-bodily modeling, autonomous interaction with the environment, social interaction and collaboration with others, and even more abstract understanding of the Self. Furthermore, the positive mutual promotion and support among multiple levels of Self, as well as between Self and learning, enhance the BriSe AI's conscious understanding of information and flexible adaptation to complex environments, serving as a driving force propelling BriSe AI towards real Artificial General Intelligence.
- [399] arXiv:2402.19195 [ pdf , ps , html , other ]
-
Title: Negative Sampling in Knowledge Graph Representation Learning: A ReviewSubjects: Artificial Intelligence (cs.AI)
Abstract: Knowledge graph representation learning (KGRL) or knowledge graph embedding (KGE) plays a crucial role in AI applications for knowledge construction and information exploration. These models aim to encode entities and relations present in a knowledge graph into a lower-dimensional vector space. During the training process of KGE models, using positive and negative samples becomes essential for discrimination purposes. However, obtaining negative samples directly from existing knowledge graphs poses a challenge, emphasizing the need for effective generation techniques. The quality of these negative samples greatly impacts the accuracy of the learned embeddings, making their generation a critical aspect of KGRL. This comprehensive survey paper systematically reviews various negative sampling (NS) methods and their contributions to the success of KGRL. Their respective advantages and disadvantages are outlined by categorizing existing NS methods into five distinct categories. Moreover, this survey identifies open research questions that serve as potential directions for future investigations. By offering a generalization and alignment of fundamental NS concepts, this survey provides valuable insights for designing effective NS methods in the context of KGRL and serves as a motivating force for further advancements in the field.
- [400] arXiv:2402.19251 [ pdf , ps , html , other ]
-
Title: A Cognitive-Based Trajectory Prediction Approach for Autonomous DrivingHaicheng Liao , Yongkang Li , Zhenning Li , Chengyue Wang , Zhiyong Cui , Shengbo Eben Li , Chengzhong XuSubjects: Artificial Intelligence (cs.AI) ; Robotics (cs.RO)
Abstract: In autonomous vehicle (AV) technology, the ability to accurately predict the movements of surrounding vehicles is paramount for ensuring safety and operational efficiency. Incorporating human decision-making insights enables AVs to more effectively anticipate the potential actions of other vehicles, significantly improving prediction accuracy and responsiveness in dynamic environments. This paper introduces the Human-Like Trajectory Prediction (HLTP) model, which adopts a teacher-student knowledge distillation framework inspired by human cognitive processes. The HLTP model incorporates a sophisticated teacher-student knowledge distillation framework. The "teacher" model, equipped with an adaptive visual sector, mimics the visual processing of the human brain, particularly the functions of the occipital and temporal lobes. The "student" model focuses on real-time interaction and decision-making, drawing parallels to prefrontal and parietal cortex functions. This approach allows for dynamic adaptation to changing driving scenarios, capturing essential perceptual cues for accurate prediction. Evaluated using the Macao Connected and Autonomous Driving (MoCAD) dataset, along with the NGSIM and HighD benchmarks, HLTP demonstrates superior performance compared to existing models, particularly in challenging environments with incomplete data. The project page is available at Github.
- [401] arXiv:2402.19265 [ pdf , ps , html , other ]
-
Title: Learning Logic Specifications for Policy Guidance in POMDPs: an Inductive Logic Programming ApproachJournal-ref: Journal of Artificial Intelligence Research, volume 79 (2024), pp. 725-776Subjects: Artificial Intelligence (cs.AI) ; Machine Learning (cs.LG); Logic in Computer Science (cs.LO)
Abstract: Partially Observable Markov Decision Processes (POMDPs) are a powerful framework for planning under uncertainty. They allow to model state uncertainty as a belief probability distribution. Approximate solvers based on Monte Carlo sampling show great success to relax the computational demand and perform online planning. However, scaling to complex realistic domains with many actions and long planning horizons is still a major challenge, and a key point to achieve good performance is guiding the action-selection process with domain-dependent policy heuristics which are tailored for the specific application domain. We propose to learn high-quality heuristics from POMDP traces of executions generated by any solver. We convert the belief-action pairs to a logical semantics, and exploit data- and time-efficient Inductive Logic Programming (ILP) to generate interpretable belief-based policy specifications, which are then used as online heuristics. We evaluate thoroughly our methodology on two notoriously challenging POMDP problems, involving large action spaces and long planning horizons, namely, rocksample and pocman. Considering different state-of-the-art online POMDP solvers, including POMCP, DESPOT and AdaOPS, we show that learned heuristics expressed in Answer Set Programming (ASP) yield performance superior to neural networks and similar to optimal handcrafted task-specific heuristics within lower computational time. Moreover, they well generalize to more challenging scenarios not experienced in the training phase (e.g., increasing rocks and grid size in rocksample, incrementing the size of the map and the aggressivity of ghosts in pocman).
- [402] arXiv:2402.19299 [ pdf , ps , html , other ]
-
Title: RL-GPT: Integrating Reinforcement Learning and Code-as-policySubjects: Artificial Intelligence (cs.AI) ; Machine Learning (cs.LG)
Abstract: Large Language Models (LLMs) have demonstrated proficiency in utilizing various tools by coding, yet they face limitations in handling intricate logic and precise control. In embodied tasks, high-level planning is amenable to direct coding, while low-level actions often necessitate task-specific refinement, such as Reinforcement Learning (RL). To seamlessly integrate both modalities, we introduce a two-level hierarchical framework, RL-GPT, comprising a slow agent and a fast agent. The slow agent analyzes actions suitable for coding, while the fast agent executes coding tasks. This decomposition effectively focuses each agent on specific tasks, proving highly efficient within our pipeline. Our approach outperforms traditional RL methods and existing GPT agents, demonstrating superior efficiency. In the Minecraft game, it rapidly obtains diamonds within a single day on an RTX3090. Additionally, it achieves SOTA performance across all designated MineDojo tasks.
- [403] arXiv:2402.19450 [ pdf , ps , html , other ]
-
Title: Functional Benchmarks for Robust Evaluation of Reasoning Performance, and the Reasoning GapSaurabh Srivastava , Annarose M B , Anto P V , Shashank Menon , Ajay Sukumar , Adwaith Samod T , Alan Philipose , Stevin Prince , Sooraj ThomasComments: 37 pages, 10 figuresSubjects: Artificial Intelligence (cs.AI) ; Computation and Language (cs.CL)
Abstract: We propose a framework for robust evaluation of reasoning capabilities of language models, using functional variants of benchmarks. Models that solve a reasoning test should exhibit no difference in performance over the static version of a problem compared to a snapshot of the functional variant. We have rewritten the relevant fragment of the MATH benchmark into its functional variant MATH(), with functionalization of other benchmarks to follow. When evaluating current state-of-the-art models over snapshots of MATH(), we find a reasoning gap -- the percentage difference between the static and functional accuracies. We find reasoning gaps from 58.35% to 80.31% among the state-of-the-art closed and open weights models that perform well on static benchmarks, with the caveat that the gaps are likely to be smaller with more sophisticated prompting strategies. Here we show that models which anecdotally have good reasoning performance over real-world tasks, have quantifiable lower gaps, motivating the open problem of building "gap 0" models. Code for evaluation and new evaluation datasets, three MATH() snapshots, are publicly available at this https URL .
- [404] arXiv:2402.00017 (cross-list from cs.CY) [ pdf , ps , html , other ]
-
Title: Deploying ADVISER: Impact and Lessons from Using Artificial Intelligence for Child Vaccination Uptake in NigeriaOpadele Kehinde , Ruth Abdul , Bose Afolabi , Parminder Vir , Corinne Namblard , Ayan Mukhopadhyay , Abiodun AdereniComments: Accepted for publication at the AAAI Conference on Artificial Intelligence (AAAI-24)Subjects: Computers and Society (cs.CY) ; Artificial Intelligence (cs.AI)
Abstract: More than 5 million children under five years die from largely preventable or treatable medical conditions every year, with an overwhelmingly large proportion of deaths occurring in underdeveloped countries with low vaccination uptake. One of the United Nations' sustainable development goals (SDG 3) aims to end preventable deaths of newborns and children under five years of age. We focus on Nigeria, where the rate of infant mortality is appalling. In particular, low vaccination uptake in Nigeria is a major driver of more than 2,000 daily deaths of children under the age of five years. In this paper, we describe our collaboration with government partners in Nigeria to deploy ADVISER: AI-Driven Vaccination Intervention Optimiser. The framework, based on an integer linear program that seeks to maximize the cumulative probability of successful vaccination, is the first successful deployment of an AI-enabled toolchain for optimizing the allocation of health interventions in Nigeria. In this paper, we provide a background of the ADVISER framework and present results, lessons, and success stories of deploying ADVISER to more than 13,000 families in the state of Oyo, Nigeria.
- [405] arXiv:2402.00024 (cross-list from q-bio.BM) [ pdf , ps , html , other ]
-
Title: Comparative Analysis of LLaMA and ChatGPT Embeddings for Molecule EmbeddingSubjects: Biomolecules (q-bio.BM) ; Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG)
Abstract: Purpose: Large Language Models (LLMs) like ChatGPT and LLaMA are increasingly recognized for their potential in the field of cheminformatics, particularly in interpreting Simplified Molecular Input Line Entry System (SMILES), a standard method for representing chemical structures. These LLMs can decode SMILES strings into vector representations, providing a novel approach to understanding chemical graphs.
Methods: We investigate the performance of ChatGPT and LLaMA in embedding SMILES strings. Our evaluation focuses on two key applications: molecular property (MP) prediction and drug-drug interaction (DDI) prediction, both essential in drug development and healthcare.
Results: We find that SMILES embeddings generated using LLaMA outperform those from ChatGPT in both MP and DDI prediction tasks. Notably, LLaMA-based SMILES embeddings show results comparable to existing methods in both prediction tasks.
Conclusion: The application of LLMs in cheminformatics, particularly in utilizing SMILES embeddings, shows significant promise for advancing drug development. This includes improving the prediction of chemical properties and facilitating the drug discovery process. GitHub: this https URL - [406] arXiv:2402.00025 (cross-list from cs.DC) [ pdf , ps , html , other ]
-
Title: Accelerating a Triton Fused Kernel for W4A16 Quantized Inference with SplitK work decompositionSubjects: Distributed, Parallel, and Cluster Computing (cs.DC) ; Artificial Intelligence (cs.AI)
Abstract: We propose an implementation of an efficient fused matrix multiplication kernel for W4A16 quantized inference, where we perform dequantization and GEMM in a fused kernel using a SplitK work decomposition. Our implementation shows improvement for the type of skinny matrix-matrix multiplications found in foundation model inference workloads. In particular, this paper surveys the type of matrix multiplication between a skinny activation matrix and a square weight matrix. Our results show an average of 65% speed improvement on A100, and an average of 124% speed improvement on H100 (with a peak of 295%) for a range of matrix dimensions including those found in a llama-style model, where m < n = k.
- [407] arXiv:2402.00029 (cross-list from cs.CY) [ pdf , ps , html , other ]
-
Title: Exploring Public Opinion on Responsible AI Through The Lens of Cultural Consensus TheoryJournal-ref: Proceedings of the 57th Hawaii International Conference on System Sciences, 713-723 (2024)Subjects: Computers and Society (cs.CY) ; Artificial Intelligence (cs.AI)
Abstract: As the societal implications of Artificial Intelligence (AI) continue to grow, the pursuit of responsible AI necessitates public engagement in its development and governance processes. This involvement is crucial for capturing diverse perspectives and promoting equitable practices and outcomes. We applied Cultural Consensus Theory (CCT) to a nationally representative survey dataset on various aspects of AI to discern beliefs and attitudes about responsible AI in the United States. Our results offer valuable insights by identifying shared and contrasting views on responsible AI. Furthermore, these findings serve as critical reference points for developers and policymakers, enabling them to more effectively consider individual variances and group-level cultural perspectives when making significant decisions and addressing the public's concerns.
- [408] arXiv:2402.00030 (cross-list from cs.NE) [ pdf , ps , html , other ]
-
Title: Evolution-Bootstrapped Simulation: Artificial or Human Intelligence: Which Came First?Comments: 6 pages, no figuresSubjects: Neural and Evolutionary Computing (cs.NE) ; Artificial Intelligence (cs.AI); Populations and Evolution (q-bio.PE)
Abstract: Humans have created artificial intelligence (AI), not the other way around. This statement is deceptively obvious. In this note, we decided to challenge this statement as a small, lighthearted Gedankenexperiment. We ask a simple question: in a world driven by evolution by natural selection, would neural networks or humans be likely to evolve first? We compare the Solomonoff--Kolmogorov--Chaitin complexity of the two and find neural networks (even LLMs) to be significantly simpler than humans. Further, we claim that it is unnecessary for any complex human-made equipment to exist for there to be neural networks. Neural networks may have evolved as naturally occurring objects before humans did as a form of chemical reaction-based or enzyme-based computation. Now that we know that neural networks can pass the Turing test and suspect that they may be capable of superintelligence, we ask whether the natural evolution of neural networks could lead from pure evolution by natural selection to what we call evolution-bootstrapped simulation. The evolution of neural networks does not involve irreducible complexity; would easily allow irreducible complexity to exist in the evolution-bootstrapped simulation; is a falsifiable scientific hypothesis; and is independent of / orthogonal to the issue of intelligent design.
- [409] arXiv:2402.00031 (cross-list from cs.LG) [ pdf , ps , html , other ]
-
Title: An Integrated Framework for Team Formation and Winner Prediction in the FIRST Robotics Competition: Model, Algorithm, and AnalysisSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Robotics (cs.RO)
Abstract: This research work aims to develop an analytical approach for optimizing team formation and predicting team performance in a competitive environment based on data on the competitors' skills prior to the team formation. There are several approaches in scientific literature to optimize and predict a team's performance. However, most studies employ fine-grained skill statistics of the individual members or constraints such as teams with a set group of members. Currently, no research tackles the highly constrained domain of the FIRST Robotics Competition. This research effort aims to fill this gap by providing an analytical method for optimizing and predicting team performance in a competitive environment while allowing these constraints and only using metrics on previous team performance, not on each individual member's performance. We apply our method to the drafting process of the FIRST Robotics competition, a domain in which the skills change year-over-year, team members change throughout the season, each match only has a superficial set of statistics, and alliance formation is key to competitive success. First, we develop a method that could extrapolate individual members' performance based on overall team performance. An alliance optimization algorithm is developed to optimize team formation and a deep neural network model is trained to predict the winning team, both using highly post-processed real-world data. Our method is able to successfully extract individual members' metrics from overall team statistics, form competitive teams, and predict the winning team with 84.08% accuracy.
- [410] arXiv:2402.00033 (cross-list from cs.CV) [ pdf , ps , html , other ]
-
Title: LF-ViT: Reducing Spatial Redundancy in Vision Transformer for Efficient Image RecognitionSubjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI)
Abstract: The Vision Transformer (ViT) excels in accuracy when handling high-resolution images, yet it confronts the challenge of significant spatial redundancy, leading to increased computational and memory requirements. To address this, we present the Localization and Focus Vision Transformer (LF-ViT). This model operates by strategically curtailing computational demands without impinging on performance. In the Localization phase, a reduced-resolution image is processed; if a definitive prediction remains elusive, our pioneering Neighborhood Global Class Attention (NGCA) mechanism is triggered, effectively identifying and spotlighting class-discriminative regions based on initial findings. Subsequently, in the Focus phase, this designated region is used from the original image to enhance recognition. Uniquely, LF-ViT employs consistent parameters across both phases, ensuring seamless end-to-end optimization. Our empirical tests affirm LF-ViT's prowess: it remarkably decreases Deit-S's FLOPs by 63\% and concurrently amplifies throughput twofold. Code of this project is at this https URL .
- [411] arXiv:2402.00034 (cross-list from cs.DC) [ pdf , ps , html , other ]
-
Title: Why does Prediction Accuracy Decrease over Time? Uncertain Positive Learning for Cloud Failure PredictionHaozhe Li , Minghua Ma , Yudong Liu , Pu Zhao , Lingling Zheng , Ze Li , Yingnong Dang , Murali Chintalapati , Saravan Rajmohan , Qingwei Lin , Dongmei ZhangSubjects: Distributed, Parallel, and Cluster Computing (cs.DC) ; Artificial Intelligence (cs.AI)
Abstract: With the rapid growth of cloud computing, a variety of software services have been deployed in the cloud. To ensure the reliability of cloud services, prior studies focus on failure instance (disk, node, and switch, etc.) prediction. Once the output of prediction is positive, mitigation actions are taken to rapidly resolve the underlying failure. According to our real-world practice in Microsoft Azure, we find that the prediction accuracy may decrease by about 9% after retraining the models. Considering that the mitigation actions may result in uncertain positive instances since they cannot be verified after mitigation, which may introduce more noise while updating the prediction model. To the best of our knowledge, we are the first to identify this Uncertain Positive Learning (UPLearning) issue in the real-world cloud failure prediction scenario. To tackle this problem, we design an Uncertain Positive Learning Risk Estimator (Uptake) approach. Using two real-world datasets of disk failure prediction and conducting node prediction experiments in Microsoft Azure, which is a top-tier cloud provider that serves millions of users, we demonstrate Uptake can significantly improve the failure prediction accuracy by 5% on average.
- [412] arXiv:2402.00037 (cross-list from cs.CY) [ pdf , ps , other ]
-
Title: Catalyzing Equity in STEM Teams: Harnessing Generative AI for Inclusion and DiversityComments: 21 pages, 0 figure, to be published in Policy Insights from Behavioral and Brain SciencesSubjects: Computers and Society (cs.CY) ; Artificial Intelligence (cs.AI)
Abstract: Collaboration is key to STEM, where multidisciplinary team research can solve complex problems. However, inequality in STEM fields hinders their full potential, due to persistent psychological barriers in underrepresented students' experience. This paper documents teamwork in STEM and explores the transformative potential of computational modeling and generative AI in promoting STEM-team diversity and inclusion. Leveraging generative AI, this paper outlines two primary areas for advancing diversity, equity, and inclusion. First, formalizing collaboration assessment with inclusive analytics can capture fine-grained learner behavior. Second, adaptive, personalized AI systems can support diversity and inclusion in STEM teams. Four policy recommendations highlight AI's capacity: formalized collaborative skill assessment, inclusive analytics, funding for socio-cognitive research, human-AI teaming for inclusion training. Researchers, educators, policymakers can build an equitable STEM ecosystem. This roadmap advances AI-enhanced collaboration, offering a vision for the future of STEM where diverse voices are actively encouraged and heard within collaborative scientific endeavors.
- [413] arXiv:2402.00045 (cross-list from cs.MM) [ pdf , ps , other ]
-
Title: Detecting Multimedia Generated by Large AI Models: A SurveyLi Lin , Neeraj Gupta , Yue Zhang , Hainan Ren , Chun-Hao Liu , Feng Ding , Xin Wang , Xin Li , Luisa Verdoliva , Shu HuSubjects: Multimedia (cs.MM) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: The rapid advancement of Large AI Models (LAIMs), particularly diffusion models and large language models, has marked a new era where AI-generated multimedia is increasingly integrated into various aspects of daily life. Although beneficial in numerous fields, this content presents significant risks, including potential misuse, societal disruptions, and ethical concerns. Consequently, detecting multimedia generated by LAIMs has become crucial, with a marked rise in related research. Despite this, there remains a notable gap in systematic surveys that focus specifically on detecting LAIM-generated multimedia. Addressing this, we provide the first survey to comprehensively cover existing research on detecting multimedia (such as text, images, videos, audio, and multimodal content) created by LAIMs. Specifically, we introduce a novel taxonomy for detection methods, categorized by media modality, and aligned with two perspectives: pure detection (aiming to enhance detection performance) and beyond detection (adding attributes like generalizability, robustness, and interpretability to detectors). Additionally, we have presented a brief overview of generation mechanisms, public datasets, and online detection tools to provide a valuable resource for researchers and practitioners in this field. Furthermore, we identify current challenges in detection and propose directions for future research that address unexplored, ongoing, and emerging issues in detecting multimedia generated by LAIMs. Our aim for this survey is to fill an academic gap and contribute to global AI security efforts, helping to ensure the integrity of information in the digital realm. The project link is this https URL .
- [414] arXiv:2402.00059 (cross-list from cs.LG) [ pdf , ps , html , other ]
-
Title: FengWu-GHR: Learning the Kilometer-scale Medium-range Global Weather ForecastingTao Han , Song Guo , Fenghua Ling , Kang Chen , Junchao Gong , Jingjia Luo , Junxia Gu , Kan Dai , Wanli Ouyang , Lei BaiComments: 19 pagesSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Atmospheric and Oceanic Physics (physics.ao-ph)
Abstract: Kilometer-scale modeling of global atmosphere dynamics enables fine-grained weather forecasting and decreases the risk of disastrous weather and climate activity. Therefore, building a kilometer-scale global forecast model is a persistent pursuit in the meteorology domain. Active international efforts have been made in past decades to improve the spatial resolution of numerical weather models. Nonetheless, developing the higher resolution numerical model remains a long-standing challenge due to the substantial consumption of computational resources. Recent advances in data-driven global weather forecasting models utilize reanalysis data for model training and have demonstrated comparable or even higher forecasting skills than numerical models. However, they are all limited by the resolution of reanalysis data and incapable of generating higher-resolution forecasts. This work presents FengWu-GHR, the first data-driven global weather forecasting model running at the 0.09$^{\circ}$ horizontal resolution. FengWu-GHR introduces a novel approach that opens the door for operating ML-based high-resolution forecasts by inheriting prior knowledge from a pretrained low-resolution model. The hindcast of weather prediction in 2022 indicates that FengWu-GHR is superior to the IFS-HRES. Furthermore, evaluations on station observations and case studies of extreme events support the competitive operational forecasting skill of FengWu-GHR at the high resolution.
- [415] arXiv:2402.00066 (cross-list from cs.LG) [ pdf , ps , other ]
-
Title: TrackGPT -- A generative pre-trained transformer for cross-domain entity trajectory forecastingComments: 16 pages, 8 figuresSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Abstract: The forecasting of entity trajectories at future points in time is a critical capability gap in applications across both Commercial and Defense sectors. Transformers, and specifically Generative Pre-trained Transformer (GPT) networks have recently revolutionized several fields of Artificial Intelligence, most notably Natural Language Processing (NLP) with the advent of Large Language Models (LLM) like OpenAI's ChatGPT. In this research paper, we introduce TrackGPT, a GPT-based model for entity trajectory forecasting that has shown utility across both maritime and air domains, and we expect to perform well in others. TrackGPT stands as a pioneering GPT model capable of producing accurate predictions across diverse entity time series datasets, demonstrating proficiency in generating both long-term forecasts with sustained accuracy and short-term forecasts with high precision. We present benchmarks against state-of-the-art deep learning techniques, showing that TrackGPT's forecasting capability excels in terms of accuracy, reliability, and modularity. Importantly, TrackGPT achieves these results while remaining domain-agnostic and requiring minimal data features (only location and time) compared to models achieving similar performance. In conclusion, our findings underscore the immense potential of applying GPT architectures to the task of entity trajectory forecasting, exemplified by the innovative TrackGPT model.
- [416] arXiv:2402.00068 (cross-list from cs.LG) [ pdf , ps , html , other ]
-
Title: GPT4Battery: An LLM-driven Framework for Adaptive State of Health Estimation of Raw Li-ion BatteriesSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Abstract: State of health (SOH) is a crucial indicator for assessing the degradation level of batteries that cannot be measured directly but requires estimation. Accurate SOH estimation enhances detection, control, and feedback for Li-ion batteries, allowing for safe and efficient energy management and guiding the development of new-generation batteries. Despite the significant progress in data-driven SOH estimation, the time and resource-consuming degradation experiments for generating lifelong training data pose a challenge in establishing one large model capable of handling diverse types of Li-ion batteries, e.g., cross-chemistry, cross-manufacturer, and cross-capacity. Hence, this paper utilizes the strong generalization capability of large language model (LLM) to proposes a novel framework for adaptable SOH estimation across diverse batteries. To match the real scenario where unlabeled data sequentially arrives in use with distribution shifts, the proposed model is modified by a test-time training technique to ensure estimation accuracy even at the battery's end of life. The validation results demonstrate that the proposed framework achieves state-of-the-art accuracy on four widely recognized datasets collected from 62 batteries. Furthermore, we analyze the theoretical challenges of cross-battery estimation and provide a quantitative explanation of the effectiveness of our method.
- [417] arXiv:2402.00069 (cross-list from cs.AR) [ pdf , ps , html , other ]
-
Title: Using the Abstract Computer Architecture Description Language to Model AI Hardware AcceleratorsMika Markus Müller , Alexander Richard Manfred Borst , Konstantin Lübeck , Alexander Louis-Ferdinand Jung , Oliver BringmannComments: Accepted Version for: MBMV'24Subjects: Hardware Architecture (cs.AR) ; Artificial Intelligence (cs.AI)
Abstract: Artificial Intelligence (AI) has witnessed remarkable growth, particularly through the proliferation of Deep Neural Networks (DNNs). These powerful models drive technological advancements across various domains. However, to harness their potential in real-world applications, specialized hardware accelerators are essential. This demand has sparked a market for parameterizable AI hardware accelerators offered by different vendors.
Manufacturers of AI-integrated products face a critical challenge: selecting an accelerator that aligns with their product's performance requirements. The decision involves choosing the right hardware and configuring a suitable set of parameters. However, comparing different accelerator design alternatives remains a complex task. Often, engineers rely on data sheets, spreadsheet calculations, or slow black-box simulators, which only offer a coarse understanding of the performance characteristics.
The Abstract Computer Architecture Description Language (ACADL) is a concise formalization of computer architecture block diagrams, which helps to communicate computer architecture on different abstraction levels and allows for inferring performance characteristics. In this paper, we demonstrate how to use the ACADL to model AI hardware accelerators, use their ACADL description to map DNNs onto them, and explain the timing simulation semantics to gather performance results. - [418] arXiv:2402.00070 (cross-list from cs.NE) [ pdf , ps , html , other ]
-
Title: EvoMerge: Neuroevolution for Large Language ModelsComments: The current submission is the first draft, published for the sole purpose of sharing an idea and encouraging community effort. A more consolidated version may come laterSubjects: Neural and Evolutionary Computing (cs.NE) ; Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG)
Abstract: Extensive fine-tuning on Large Language Models does not always yield better results. Oftentimes, models tend to get better at imitating one form of data without gaining greater reasoning ability and may even end up losing some intelligence. Here I introduce EvoMerge, a systematic approach to large language model training and merging. Leveraging model merging for weight crossover and fine-tuning for weight mutation, EvoMerge establishes an evolutionary process aimed at pushing models beyond the limits of conventional fine-tuning.
- [419] arXiv:2402.00084 (cross-list from cs.LG) [ pdf , ps , html , other ]
-
Title: EPSD: Early Pruning with Self-Distillation for Efficient Model CompressionDong Chen , Ning Liu , Yichen Zhu , Zhengping Che , Rui Ma , Fachao Zhang , Xiaofeng Mou , Yi Chang , Jian TangComments: The first two authors are with equal contributions. Paper accepted by AAAI 2024Subjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
Abstract: Neural network compression techniques, such as knowledge distillation (KD) and network pruning, have received increasing attention. Recent work `Prune, then Distill' reveals that a pruned student-friendly teacher network can benefit the performance of KD. However, the conventional teacher-student pipeline, which entails cumbersome pre-training of the teacher and complicated compression steps, makes pruning with KD less efficient. In addition to compressing models, recent compression techniques also emphasize the aspect of efficiency. Early pruning demands significantly less computational cost in comparison to the conventional pruning methods as it does not require a large pre-trained model. Likewise, a special case of KD, known as self-distillation (SD), is more efficient since it requires no pre-training or student-teacher pair selection. This inspires us to collaborate early pruning with SD for efficient model compression. In this work, we propose the framework named Early Pruning with Self-Distillation (EPSD), which identifies and preserves distillable weights in early pruning for a given SD task. EPSD efficiently combines early pruning and self-distillation in a two-step process, maintaining the pruned network's trainability for compression. Instead of a simple combination of pruning and SD, EPSD enables the pruned network to favor SD by keeping more distillable weights before training to ensure better distillation of the pruned network. We demonstrated that EPSD improves the training of pruned networks, supported by visual and quantitative analyses. Our evaluation covered diverse benchmarks (CIFAR-10/100, Tiny-ImageNet, full ImageNet, CUB-200-2011, and Pascal VOC), with EPSD outperforming advanced pruning and SD techniques.
- [420] arXiv:2402.00085 (cross-list from cs.LG) [ pdf , ps , html , other ]
-
Title: Scheduled Curiosity-Deep Dyna-Q: Efficient Exploration for Dialog Policy LearningComments: 18 pages, 11 figuresSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Abstract: Training task-oriented dialog agents based on reinforcement learning is time-consuming and requires a large number of interactions with real users. How to grasp dialog policy within limited dialog experiences remains an obstacle that makes the agent training process less efficient. In addition, most previous frameworks start training by randomly choosing training samples, which differs from the human learning method and hurts the efficiency and stability of training. Therefore, we propose Scheduled Curiosity-Deep Dyna-Q (SC-DDQ), a curiosity-driven curriculum learning framework based on a state-of-the-art model-based reinforcement learning dialog model, Deep Dyna-Q (DDQ). Furthermore, we designed learning schedules for SC-DDQ and DDQ, respectively, following two opposite training strategies: classic curriculum learning and its reverse version. Our results show that by introducing scheduled learning and curiosity, the new framework leads to a significant improvement over the DDQ and Deep Q-learning(DQN). Surprisingly, we found that traditional curriculum learning was not always effective. Specifically, according to the experimental results, the easy-first and difficult-first strategies are more suitable for SC-DDQ and DDQ. To analyze our results, we adopted the entropy of sampled actions to depict action exploration and found that training strategies with high entropy in the first stage and low entropy in the last stage lead to better performance.
- [421] arXiv:2402.00086 (cross-list from cs.LG) [ pdf , ps , html , other ]
-
Title: Retrosynthesis prediction enhanced by in-silico reaction data augmentationSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Abstract: Recent advances in machine learning (ML) have expedited retrosynthesis research by assisting chemists to design experiments more efficiently. However, all ML-based methods consume substantial amounts of paired training data (i.e., chemical reaction: product-reactant(s) pair), which is costly to obtain. Moreover, companies view reaction data as a valuable asset and restrict the accessibility to researchers. These issues prevent the creation of more powerful retrosynthesis models due to their data-driven nature. As a response, we exploit easy-to-access unpaired data (i.e., one component of product-reactant(s) pair) for generating in-silico paired data to facilitate model training. Specifically, we present RetroWISE, a self-boosting framework that employs a base model inferred from real paired data to perform in-silico reaction generation and augmentation using unpaired data, ultimately leading to a superior model. On three benchmark datasets, RetroWISE achieves the best overall performance against state-of-the-art models (e.g., +8.6% top-1 accuracy on the USPTO-50K test dataset). Moreover, it consistently improves the prediction accuracy of rare transformations. These results show that Retro- WISE overcomes the training bottleneck by in-silico reactions, thereby paving the way toward more effective ML-based retrosynthesis models.
- [422] arXiv:2402.00089 (cross-list from cs.NE) [ pdf , ps , other ]
-
Title: SCAPE: Searching Conceptual Architecture Prompts using EvolutionComments: 8 pagesJournal-ref: IEEE Congress on Evolutionary Computation (IEEE World Congress on Computational Intelligence 2024), Yokohama, JapanSubjects: Neural and Evolutionary Computing (cs.NE) ; Artificial Intelligence (cs.AI)
Abstract: Conceptual architecture involves a highly creative exploration of novel ideas, often taken from other disciplines as architects consider radical new forms, materials, textures and colors for buildings. While today's generative AI systems can produce remarkable results, they lack the creativity demonstrated for decades by evolutionary algorithms. SCAPE, our proposed tool, combines evolutionary search with generative AI, enabling users to explore creative and good quality designs inspired by their initial input through a simple point and click interface. SCAPE injects randomness into generative AI, and enables memory, making use of the built-in language skills of GPT-4 to vary prompts via text-based mutation and crossover. We demonstrate that compared to DALL-E 3, SCAPE enables a 67% improvement in image novelty, plus improvements in quality and effectiveness of use; we show that in just three iterations SCAPE has a 24% image novelty increase enabling effective exploration, plus optimization of images by users. We use more than 20 independent architects to assess SCAPE, who provide markedly positive feedback.
- [423] arXiv:2402.00092 (cross-list from cs.LG) [ pdf , ps , html , other ]
-
Title: Episodic-free Task Selection for Few-shot LearningSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Abstract: Episodic training is a mainstream training strategy for few-shot learning. In few-shot scenarios, however, this strategy is often inferior to some non-episodic training strategy, e. g., Neighbourhood Component Analysis (NCA), which challenges the principle that training conditions must match testing conditions. Thus, a question is naturally asked: How to search for episodic-free tasks for better few-shot learning? In this work, we propose a novel meta-training framework beyond episodic training. In this framework, episodic tasks are not used directly for training, but for evaluating the effectiveness of some selected episodic-free tasks from a task set that are performed for training the meta-learners. The selection criterion is designed with the affinity, which measures the degree to which loss decreases when executing the target tasks after training with the selected tasks. In experiments, the training task set contains some promising types, e. g., contrastive learning and classification, and the target few-shot tasks are achieved with the nearest centroid classifiers on the miniImageNet, tiered-ImageNet and CIFAR-FS datasets. The experimental results demonstrate the effectiveness of our approach.
- [424] arXiv:2402.00094 (cross-list from cs.NE) [ pdf , ps , html , other ]
-
Title: Deep Neural Networks: A Formulation Via Non-Archimedean AnalysisSubjects: Neural and Evolutionary Computing (cs.NE) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: We introduce a new class of deep neural networks (DNNs) with multilayered tree-like architectures. The architectures are codified using numbers from the ring of integers of non-Archimdean local fields. These rings have a natural hierarchical organization as infinite rooted trees. Natural morphisms on these rings allow us to construct finite multilayered architectures. The new DNNs are robust universal approximators of real-valued functions defined on the mentioned rings. We also show that the DNNs are robust universal approximators of real-valued square-integrable functions defined in the unit interval.
- [425] arXiv:2402.00135 (cross-list from cs.RO) [ pdf , ps , html , other ]
-
Title: A Reinforcement Learning Based Controller to Minimize Forces on the Crutches of a Lower-Limb ExoskeletonComments: 6 pages, 5 FiguresSubjects: Robotics (cs.RO) ; Artificial Intelligence (cs.AI); Systems and Control (eess.SY)
Abstract: Metabolic energy consumption of a powered lower-limb exoskeleton user mainly comes from the upper body effort since the lower body is considered to be passive. However, the upper body effort of the users is largely ignored in the literature when designing motion controllers. In this work, we use deep reinforcement learning to develop a locomotion controller that minimizes ground reaction forces (GRF) on crutches. The rationale for minimizing GRF is to reduce the upper body effort of the user. Accordingly, we design a model and a learning framework for a human-exoskeleton system with crutches. We formulate a reward function to encourage the forward displacement of a human-exoskeleton system while satisfying the predetermined constraints of a physical robot. We evaluate our new framework using Proximal Policy Optimization, a state-of-the-art deep reinforcement learning (RL) method, on the MuJoCo physics simulator with different hyperparameters and network architectures over multiple trials. We empirically show that our learning model can generate joint torques based on the joint angle, velocities, and the GRF on the feet and crutch tips. The resulting exoskeleton model can directly generate joint torques from states in line with the RL framework. Finally, we empirically show that policy trained using our method can generate a gait with a 35% reduction in GRF with respect to the baseline.
- [426] arXiv:2402.00138 (cross-list from cs.DS) [ pdf , ps , html , other ]
-
Title: Decomposable Submodular Maximization in Federated SettingSubjects: Data Structures and Algorithms (cs.DS) ; Artificial Intelligence (cs.AI); Distributed, Parallel, and Cluster Computing (cs.DC); Machine Learning (cs.LG)
Abstract: Submodular functions, as well as the sub-class of decomposable submodular functions, and their optimization appear in a wide range of applications in machine learning, recommendation systems, and welfare maximization. However, optimization of decomposable submodular functions with millions of component functions is computationally prohibitive. Furthermore, the component functions may be private (they might represent user preference function, for example) and cannot be widely shared. To address these issues, we propose a {\em federated optimization} setting for decomposable submodular optimization. In this setting, clients have their own preference functions, and a weighted sum of these preferences needs to be maximized. We implement the popular {\em continuous greedy} algorithm in this setting where clients take parallel small local steps towards the local solution and then the local changes are aggregated at a central server. To address the large number of clients, the aggregation is performed only on a subsampled set. Further, the aggregation is performed only intermittently between stretches of parallel local steps, which reduces communication cost significantly. We show that our federated algorithm is guaranteed to provide a good approximate solution, even in the presence of above cost-cutting measures. Finally, we show how the federated setting can be incorporated in solving fundamental discrete submodular optimization problems such as Maximum Coverage and Facility Location.
- [427] arXiv:2402.00163 (cross-list from cs.CV) [ pdf , ps , html , other ]
-
Title: Improving Object Detection Quality in Football Through Super-Resolution TechniquesSubjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI)
Abstract: This study explores the potential of super-resolution techniques in enhancing object detection accuracy in football. Given the sport's fast-paced nature and the critical importance of precise object (e.g. ball, player) tracking for both analysis and broadcasting, super-resolution could offer significant improvements. We investigate how advanced image processing through super-resolution impacts the accuracy and reliability of object detection algorithms in processing football match footage.
Our methodology involved applying state-of-the-art super-resolution techniques to a diverse set of football match videos from SoccerNet, followed by object detection using Faster R-CNN. The performance of these algorithms, both with and without super-resolution enhancement, was rigorously evaluated in terms of detection accuracy.
The results indicate a marked improvement in object detection accuracy when super-resolution preprocessing is applied. The improvement of object detection through the integration of super-resolution techniques yields significant benefits, especially for low-resolution scenarios, with a notable 12\% increase in mean Average Precision (mAP) at an IoU (Intersection over Union) range of 0.50:0.95 for 320x240 size images when increasing the resolution fourfold using RLFN. As the dimensions increase, the magnitude of improvement becomes more subdued; however, a discernible improvement in the quality of detection is consistently evident. Additionally, we discuss the implications of these findings for real-time sports analytics, player tracking, and the overall viewing experience. The study contributes to the growing field of sports technology by demonstrating the practical benefits and limitations of integrating super-resolution techniques in football analytics and broadcasting. - [428] arXiv:2402.00219 (cross-list from cs.DC) [ pdf , ps , html , other ]
-
Title: FedCore: Straggler-Free Federated Learning with Distributed CoresetsHongpeng Guo , Haotian Gu , Xiaoyang Wang , Bo Chen , Eun Kyung Lee , Tamar Eilam , Deming Chen , Klara NahrstedtSubjects: Distributed, Parallel, and Cluster Computing (cs.DC) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: Federated learning (FL) is a machine learning paradigm that allows multiple clients to collaboratively train a shared model while keeping their data on-premise. However, the straggler issue, due to slow clients, often hinders the efficiency and scalability of FL. This paper presents FedCore, an algorithm that innovatively tackles the straggler problem via the decentralized selection of coresets, representative subsets of a dataset. Contrary to existing centralized coreset methods, FedCore creates coresets directly on each client in a distributed manner, ensuring privacy preservation in FL. FedCore translates the coreset optimization problem into a more tractable k-medoids clustering problem and operates distributedly on each client. Theoretical analysis confirms FedCore's convergence, and practical evaluations demonstrate an 8x reduction in FL training time, without compromising model accuracy. Our extensive evaluations also show that FedCore generalizes well to existing FL frameworks.
- [429] arXiv:2402.00232 (cross-list from cs.LG) [ pdf , ps , html , other ]
-
Title: Learning Label Hierarchy with Supervised Contrastive LearningSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Abstract: Supervised contrastive learning (SCL) frameworks treat each class as independent and thus consider all classes to be equally important. This neglects the common scenario in which label hierarchy exists, where fine-grained classes under the same category show more similarity than very different ones. This paper introduces a family of Label-Aware SCL methods (LASCL) that incorporates hierarchical information to SCL by leveraging similarities between classes, resulting in creating a more well-structured and discriminative feature space. This is achieved by first adjusting the distance between instances based on measures of the proximity of their classes with the scaled instance-instance-wise contrastive. An additional instance-center-wise contrastive is introduced to move within-class examples closer to their centers, which are represented by a set of learnable label parameters. The learned label parameters can be directly used as a nearest neighbor classifier without further finetuning. In this way, a better feature representation is generated with improvements of intra-cluster compactness and inter-cluster separation. Experiments on three datasets show that the proposed LASCL works well on text classification of distinguishing a single label among multi-labels, outperforming the baseline supervised approaches. Our code is publicly available.
- [430] arXiv:2402.00234 (cross-list from cs.HC) [ pdf , ps , html , other ]
-
Title: Are Generative AI systems Capable of Supporting Information Needs of Patients?Shreya Rajagopal , Subhashis Hazarika , Sookyung Kim , Yan-ming Chiou , Jae Ho Sohn , Hari Subramonyam , Shiwali MohanSubjects: Human-Computer Interaction (cs.HC) ; Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG)
Abstract: Patients managing a complex illness such as cancer face a complex information challenge where they not only must learn about their illness but also how to manage it. Close interaction with healthcare experts (radiologists, oncologists) can improve patient learning and thereby, their disease outcome. However, this approach is resource intensive and takes expert time away from other critical tasks. Given the recent advancements in Generative AI models aimed at improving the healthcare system, our work investigates whether and how generative visual question answering systems can responsibly support patient information needs in the context of radiology imaging data. We conducted a formative need-finding study in which participants discussed chest computed tomography (CT) scans and associated radiology reports of a fictitious close relative with a cardiothoracic radiologist. Using thematic analysis of the conversation between participants and medical experts, we identified commonly occurring themes across interactions, including clarifying medical terminology, locating the problems mentioned in the report in the scanned image, understanding disease prognosis, discussing the next diagnostic steps, and comparing treatment options. Based on these themes, we evaluated two state-of-the-art generative visual language models against the radiologist's responses. Our results reveal variability in the quality of responses generated by the models across various themes. We highlight the importance of patient-facing generative AI systems to accommodate a diverse range of conversational themes, catering to the real-world informational needs of patients.
- [431] arXiv:2402.00251 (cross-list from cs.LG) [ pdf , ps , html , other ]
-
Title: Efficient Non-Parametric Uncertainty Quantification for Black-Box Large Language Models and Decision PlanningSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
Abstract: Step-by-step decision planning with large language models (LLMs) is gaining attention in AI agent development. This paper focuses on decision planning with uncertainty estimation to address the hallucination problem in language models. Existing approaches are either white-box or computationally demanding, limiting use of black-box proprietary LLMs within budgets. The paper's first contribution is a non-parametric uncertainty quantification method for LLMs, efficiently estimating point-wise dependencies between input-decision on the fly with a single inference, without access to token logits. This estimator informs the statistical interpretation of decision trustworthiness. The second contribution outlines a systematic design for a decision-making agent, generating actions like ``turn on the bathroom light'' based on user prompts such as ``take a bath''. Users will be asked to provide preferences when more than one action has high estimated point-wise dependencies. In conclusion, our uncertainty estimation and decision-making agent design offer a cost-efficient approach for AI agent development.
- [432] arXiv:2402.00254 (cross-list from cs.LG) [ pdf , ps , html , other ]
-
Title: Vertical Symbolic Regression via Deep Policy GradientComments: see animated demo at: this http URLSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Abstract: Vertical Symbolic Regression (VSR) recently has been proposed to expedite the discovery of symbolic equations with many independent variables from experimental data. VSR reduces the search spaces following the vertical discovery path by building from reduced-form equations involving a subset of independent variables to full-fledged ones. Proved successful by many symbolic regressors, deep neural networks are expected to further scale up VSR. Nevertheless, directly combining VSR with deep neural networks will result in difficulty in passing gradients and other engineering issues. We propose Vertical Symbolic Regression using Deep Policy Gradient (VSR-DPG) and demonstrate that VSR-DPG can recover ground-truth equations involving multiple input variables, significantly beyond both deep reinforcement learning-based approaches and previous VSR variants. Our VSR-DPG models symbolic regression as a sequential decision-making process, in which equations are built from repeated applications of grammar rules. The integrated deep model is trained to maximize a policy gradient objective. Experimental results demonstrate that our VSR-DPG significantly outperforms popular baselines in identifying both algebraic equations and ordinary differential equations on a series of benchmarks.
- [433] arXiv:2402.00260 (cross-list from cs.RO) [ pdf , ps , html , other ]
-
Title: Towards scalable robotic intervention of children with Autism Spectrum Disorder using LLMsComments: This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessibleSubjects: Robotics (cs.RO) ; Artificial Intelligence (cs.AI)
Abstract: In this paper, we propose a social robot capable of verbally interacting with children with Autism Spectrum Disorder (ASD). This communication is meant to teach perspective-taking using text generated using a Large Language Model (LLM) pipeline. The social robot NAO acts as a stimulator (verbally describes a social situation and asks a question), prompter (presents three options to choose from), and reinforcer (praises when the answer is correct). For the role of the stimulator, the social situation, questions, and options are generated using our LLM pipeline. We compare two approaches: GPT-2 + BART and GPT-2 + GPT-2, where the first GPT-2 common between the pipelines is used for unsupervised social situation generation. We use the SOCIALIQA dataset to fine-tune all of our LLM pipelines. We found that the GPT-2 + BART pipeline had a better BERTscore for generating the questions and the options by combining their individual loss functions. This observation was also consistent with the human evaluations. Lastly, the unsupervised generation of social situations was visualized using T-SNE plots, and the entire pipeline was evaluated for appropriateness for children with ASD by human experts.
- [434] arXiv:2402.00284 (cross-list from cs.IR) [ pdf , ps , html , other ]
-
Title: PAP-REC: Personalized Automatic Prompt for Recommendation Language ModelSubjects: Information Retrieval (cs.IR) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: Recently emerged prompt-based Recommendation Language Models (RLM) can solve multiple recommendation tasks uniformly. The RLMs make full use of the inherited knowledge learned from the abundant pre-training data to solve the downstream recommendation tasks by prompts, without introducing additional parameters or network training. However, handcrafted prompts require significant expertise and human effort since slightly rewriting prompts may cause massive performance changes. In this paper, we propose PAP-REC, a framework to generate the Personalized Automatic Prompt for RECommendation language models to mitigate the inefficiency and ineffectiveness problems derived from manually designed prompts. Specifically, personalized automatic prompts allow different users to have different prompt tokens for the same task, automatically generated using a gradient-based method. One challenge for personalized automatic prompt generation for recommendation language models is the extremely large search space, leading to a long convergence time. To effectively and efficiently address the problem, we develop surrogate metrics and leverage an alternative updating schedule for prompting recommendation language models. Experimental results show that our PAP-REC framework manages to generate personalized prompts, and the automatically generated prompts outperform manually constructed prompts and also outperform various baseline recommendation models. The source code of the work is available at this https URL .
- [435] arXiv:2402.00306 (cross-list from cs.LG) [ pdf , ps , other ]
-
Title: An Accurate and Low-Parameter Machine Learning Architecture for Next Location PredictionComments: Paper was accepted and presented in person at the 2023 IEEE Future Networks World Forum, in Baltimore, Maryland, USASubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Abstract: Next location prediction is a discipline that involves predicting a users next location. Its applications include resource allocation, quality of service, energy efficiency, and traffic management. This paper proposes an energy-efficient, small, and low parameter machine learning (ML) architecture for accurate next location prediction, deployable on modest base stations and edge devices. To accomplish this we ran a hundred hyperparameter experiments on the full human mobility patterns of an entire city, to determine an exact ML architecture that reached a plateau of accuracy with the least amount of model parameters. We successfully achieved a reduction in the number of model parameters within published ML architectures from 202 million down to 2 million. This reduced the total size of the model parameters from 791 MB down to 8 MB. Additionally, this decreased the training time by a factor of four, the amount of graphics processing unit (GPU) memory needed for training by a factor of twenty, and the overall accuracy was increased from 80.16% to 82.54%. This improvement allows for modest base stations and edge devices which do not have a large amount of memory or storage, to deploy and utilize the proposed ML architecture for next location prediction.
- [436] arXiv:2402.00312 (cross-list from q-bio.OT) [ pdf , ps , other ]
-
Title: The whack-a-mole governance challenge for AI-enabled synthetic biology: literature review and emerging frameworksJournal-ref: Front. Bioeng. Biotechnol. 12:1359768.Subjects: Other Quantitative Biology (q-bio.OT) ; Artificial Intelligence (cs.AI)
Abstract: AI-enabled synthetic biology has tremendous potential but also significantly increases biorisks and brings about a new set of dual use concerns. The picture is complicated given the vast innovations envisioned to emerge by combining emerging technologies, as AI-enabled synthetic biology potentially scales up bioengineering into industrial biomanufacturing. However, the literature review indicates that goals such as maintaining a reasonable scope for innovation, or more ambitiously to foster a huge bioeconomy don't necessarily contrast with biosafety, but need to go hand in hand. This paper presents a literature review of the issues and describes emerging frameworks for policy and practice that transverse the options of command-and control, stewardship, bottom-up, and laissez-faire governance. How to achieve early warning systems that enable prevention and mitigation of future AI-enabled biohazards from the lab, from deliberate misuse, or from the public realm, will constantly need to evolve, and adaptive, interactive approaches should emerge. Although biorisk is subject to an established governance regime, and scientists generally adhere to biosafety protocols, even experimental, but legitimate use by scientists could lead to unexpected developments. Recent advances in chatbots enabled by generative AI have revived fears that advanced biological insight can more easily get into the hands of malignant individuals or organizations. Given these sets of issues, society needs to rethink how AI-enabled synthetic biology should be governed. The suggested way to visualize the challenge at hand is whack-a-mole governance, although the emerging solutions are perhaps not so different either.
- [437] arXiv:2402.00319 (cross-list from cs.CV) [ pdf , ps , html , other ]
-
Title: SCO-VIST: Social Interaction Commonsense Knowledge-based Visual StorytellingSubjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI)
Abstract: Visual storytelling aims to automatically generate a coherent story based on a given image sequence. Unlike tasks like image captioning, visual stories should contain factual descriptions, worldviews, and human social commonsense to put disjointed elements together to form a coherent and engaging human-writeable story. However, most models mainly focus on applying factual information and using taxonomic/lexical external knowledge when attempting to create stories. This paper introduces SCO-VIST, a framework representing the image sequence as a graph with objects and relations that includes human action motivation and its social interaction commonsense knowledge. SCO-VIST then takes this graph representing plot points and creates bridges between plot points with semantic and occurrence-based edge weights. This weighted story graph produces the storyline in a sequence of events using Floyd-Warshall's algorithm. Our proposed framework produces stories superior across multiple metrics in terms of visual grounding, coherence, diversity, and humanness, per both automatic and human evaluations.
- [438] arXiv:2402.00334 (cross-list from cs.MA) [ pdf , ps , html , other ]
-
Title: Multi-agent Path Finding for Cooperative Autonomous DrivingComments: 7 pages, 3 figures, IEEE International Conference on Robotics and Automation (ICRA), 2024Subjects: Multiagent Systems (cs.MA) ; Artificial Intelligence (cs.AI); Robotics (cs.RO)
Abstract: Anticipating possible future deployment of connected and automated vehicles (CAVs), cooperative autonomous driving at intersections has been studied by many works in control theory and intelligent transportation across decades. Simultaneously, recent parallel works in robotics have devised efficient algorithms for multi-agent path finding (MAPF), though often in environments with simplified kinematics. In this work, we hybridize insights and algorithms from MAPF with the structure and heuristics of optimizing the crossing order of CAVs at signal-free intersections. We devise an optimal and complete algorithm, Order-based Search with Kinematics Arrival Time Scheduling (OBS-KATS), which significantly outperforms existing algorithms, fixed heuristics, and prioritized planning with KATS. The performance is maintained under different vehicle arrival rates, lane lengths, crossing speeds, and control horizon. Through ablations and dissections, we offer insight on the contributing factors to OBS-KATS's performance. Our work is directly applicable to many similarly scaled traffic and multi-robot scenarios with directed lanes.
- [439] arXiv:2402.00348 (cross-list from cs.LG) [ pdf , ps , html , other ]
-
Title: ODICE: Revealing the Mystery of Distribution Correction Estimation via Orthogonal-gradient UpdateComments: Spotlight @ ICLR 2024, first two authors contribute equallySubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Abstract: In this study, we investigate the DIstribution Correction Estimation (DICE) methods, an important line of work in offline reinforcement learning (RL) and imitation learning (IL). DICE-based methods impose state-action-level behavior constraint, which is an ideal choice for offline learning. However, they typically perform much worse than current state-of-the-art (SOTA) methods that solely use action-level behavior constraint. After revisiting DICE-based methods, we find there exist two gradient terms when learning the value function using true-gradient update: forward gradient (taken on the current state) and backward gradient (taken on the next state). Using forward gradient bears a large similarity to many offline RL methods, and thus can be regarded as applying action-level constraint. However, directly adding the backward gradient may degenerate or cancel out its effect if these two gradients have conflicting directions. To resolve this issue, we propose a simple yet effective modification that projects the backward gradient onto the normal plane of the forward gradient, resulting in an orthogonal-gradient update, a new learning rule for DICE-based methods. We conduct thorough theoretical analyses and find that the projected backward gradient brings state-level behavior regularization, which reveals the mystery of DICE-based methods: the value learning objective does try to impose state-action-level constraint, but needs to be used in a corrected way. Through toy examples and extensive experiments on complex offline RL and IL tasks, we demonstrate that DICE-based methods using orthogonal-gradient updates (O-DICE) achieve SOTA performance and great robustness.
- [440] arXiv:2402.00350 (cross-list from cs.SE) [ pdf , ps , other ]
-
Title: Large Language Models Based Fuzzing Techniques: A SurveyComments: 9 pages submission under reviewSubjects: Software Engineering (cs.SE) ; Artificial Intelligence (cs.AI)
Abstract: In the modern era where software plays a pivotal role, software security and vulnerability analysis have become essential for software development. Fuzzing test, as an efficient software testing method, are widely used in various domains. Moreover, the rapid development of Large Language Models (LLMs) has facilitated their application in the field of software testing, demonstrating remarkable performance. Considering that existing fuzzing test techniques are not entirely automated and software vulnerabilities continue to evolve, there is a growing trend towards employing fuzzing test generated based on large language models. This survey provides a systematic overview of the approaches that fuse LLMs and fuzzing tests for software testing. In this paper, a statistical analysis and discussion of the literature in three areas, namely LLMs, fuzzing test, and fuzzing test generated based on LLMs, are conducted by summarising the state-of-the-art methods up until 2024. Our survey also investigates the potential for widespread deployment and application of fuzzing test techniques generated by LLMs in the future.
- [441] arXiv:2402.00355 (cross-list from cs.LG) [ pdf , ps , other ]
-
Title: Adaptive Primal-Dual Method for Safe Reinforcement LearningWeiqin Chen , James Onyejizu , Long Vu , Lan Hoang , Dharmashankar Subramanian , Koushik Kar , Sandipan Mishra , Santiago PaternainSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Optimization and Control (math.OC)
Abstract: Primal-dual methods have a natural application in Safe Reinforcement Learning (SRL), posed as a constrained policy optimization problem. In practice however, applying primal-dual methods to SRL is challenging, due to the inter-dependency of the learning rate (LR) and Lagrangian multipliers (dual variables) each time an embedded unconstrained RL problem is solved. In this paper, we propose, analyze and evaluate adaptive primal-dual (APD) methods for SRL, where two adaptive LRs are adjusted to the Lagrangian multipliers so as to optimize the policy in each iteration. We theoretically establish the convergence, optimality and feasibility of the APD algorithm. Finally, we conduct numerical evaluation of the practical APD algorithm with four well-known environments in Bullet-Safey-Gym employing two state-of-the-art SRL algorithms: PPO-Lagrangian and DDPG-Lagrangian. All experiments show that the practical APD algorithm outperforms (or achieves comparable performance) and attains more stable training than the constant LR cases. Additionally, we substantiate the robustness of selecting the two adaptive LRs by empirical evidence.
- [442] arXiv:2402.00362 (cross-list from physics.ao-ph) [ pdf , ps , other ]
-
Title: Climate Trends of Tropical Cyclone Intensity and Energy Extremes Revealed by Deep LearningComments: 41 pagesSubjects: Atmospheric and Oceanic Physics (physics.ao-ph) ; Artificial Intelligence (cs.AI)
Abstract: Anthropogenic influences have been linked to tropical cyclone (TC) poleward migration, TC extreme precipitation, and an increased proportion of major hurricanes [1, 2, 3, 4]. Understanding past TC trends and variability is critical for projecting future TC impacts on human society considering the changing climate [5]. However, past trends of TC structure/energy remain uncertain due to limited observations; subjective-analyzed and spatiotemporal-heterogeneous "best-track" datasets lead to reduced confidence in the assessed TC repose to climate change [6, 7]. Here, we use deep learning to reconstruct past "observations" and yield an objective global TC wind profile dataset during 1981 to 2020, facilitating a comprehensive examination of TC structure/energy. By training with uniquely labeled data integrating best tracks and numerical model analysis of 2004 to 2018 TCs, our model converts multichannel satellite imagery to a 0-750-km wind profile of axisymmetric surface winds. The model performance is verified to be sufficient for climate studies by comparing it to independent satellite-radar surface winds. Based on the new homogenized dataset, the major TC proportion has increased by ~13% in the past four decades. Moreover, the proportion of extremely high-energy TCs has increased by ~25%, along with an increasing trend (> one standard deviation of the 40-y variability) of the mean total energy of high-energy TCs. Although the warming ocean favors TC intensification, the TC track migration to higher latitudes and altered environments further affect TC structure/energy. This new deep learning method/dataset reveals novel trends regarding TC structure extremes and may help verify simulations/studies regarding TCs in the changing climate.
- [443] arXiv:2402.00366 (cross-list from cs.RO) [ pdf , ps , other ]
-
Title: Legged Robot State Estimation With Invariant Extended Kalman Filter Using Neural Measurement NetworkComments: 8pages, 6paper, This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessibleSubjects: Robotics (cs.RO) ; Artificial Intelligence (cs.AI)
Abstract: This paper introduces a novel proprioceptive state estimator for legged robots that combines model-based filters and deep neural networks. Recent studies have shown that neural networks such as multi-layer perceptron or recurrent neural networks can estimate the robot states, including contact probability and linear velocity. Inspired by this, we develop a state estimation framework that integrates a neural measurement network (NMN) with an invariant extended Kalman filter. We show that our framework improves estimation performance in various terrains. Existing studies that combine model-based filters and learning-based approaches typically use real-world data. However, our approach relies solely on simulation data, as it allows us to easily obtain extensive data. This difference leads to a gap between the learning and the inference domain, commonly referred to as a sim-to-real gap. We address this challenge by adapting existing learning techniques and regularization. To validate our proposed method, we conduct experiments using a quadruped robot on four types of terrain: \textit{flat}, \textit{debris}, \textit{soft}, and \textit{slippery}. We observe that our approach significantly reduces position drift compared to the existing model-based state estimator.
- [444] arXiv:2402.00388 (cross-list from cs.LG) [ pdf , ps , other ]
-
Title: Cumulative Distribution Function based General Temporal Point ProcessesMaolin Wang , Yu Pan , Zenglin Xu , Ruocheng Guo , Xiangyu Zhao , Wanyu Wang , Yiqi Wang , Zitao Liu , Langming LiuSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
Abstract: Temporal Point Processes (TPPs) hold a pivotal role in modeling event sequences across diverse domains, including social networking and e-commerce, and have significantly contributed to the advancement of recommendation systems and information retrieval strategies. Through the analysis of events such as user interactions and transactions, TPPs offer valuable insights into behavioral patterns, facilitating the prediction of future trends. However, accurately forecasting future events remains a formidable challenge due to the intricate nature of these patterns. The integration of Neural Networks with TPPs has ushered in the development of advanced deep TPP models. While these models excel at processing complex and nonlinear temporal data, they encounter limitations in modeling intensity functions, grapple with computational complexities in integral computations, and struggle to capture long-range temporal dependencies effectively. In this study, we introduce the CuFun model, representing a novel approach to TPPs that revolves around the Cumulative Distribution Function (CDF). CuFun stands out by uniquely employing a monotonic neural network for CDF representation, utilizing past events as a scaling factor. This innovation significantly bolsters the model's adaptability and precision across a wide range of data scenarios. Our approach addresses several critical issues inherent in traditional TPP modeling: it simplifies log-likelihood calculations, extends applicability beyond predefined density function forms, and adeptly captures long-range temporal patterns. Our contributions encompass the introduction of a pioneering CDF-based TPP model, the development of a methodology for incorporating past event information into future event prediction, and empirical validation of CuFun's effectiveness through extensive experimentation on synthetic and real-world datasets.
- [445] arXiv:2402.00389 (cross-list from math.OC) [ pdf , ps , html , other ]
-
Title: On the $O(\frac{\sqrt{d}}{T^{1/4}})$ Convergence Rate of RMSProp and Its Momentum Extension Measured by $\ell_1$ NormComments: V3 vs V2: A fairer comparison with (Li et al., 2023). V2 vs V1: (1) Correct one error in v1. (2) Improve the convergence rate matching the lower bound with respect to all the coefficients except the dimensionSubjects: Optimization and Control (math.OC) ; Artificial Intelligence (cs.AI)
Abstract: Although adaptive gradient methods have been extensively used in deep learning, their convergence rates proved in the literature are all slower than that of SGD, particularly with respect to their dependence on the dimension. This paper considers the classical RMSProp and its momentum extension and establishes the convergence rate of $\frac{1}{T}\sum_{k=1}^T E\left[\|\nabla f(x^k)\|_1\right]\leq O(\frac{\sqrt{d}C}{T^{1/4}})$ measured by $\ell_1$ norm without the bounded gradient assumption, where $d$ is the dimension of the optimization variable, $T$ is the iteration number, and $C$ is a constant identical to that appeared in the optimal convergence rate of SGD. Our convergence rate matches the lower bound with respect to all the coefficients except the dimension $d$. Since $\|x\|_2\ll\|x\|_1\leq\sqrt{d}\|x\|_2$ for problems with extremely large $d$, our convergence rate can be considered to be analogous to the $\frac{1}{T}\sum_{k=1}^T E\left[\|\nabla f(x^k)\|_2\right]\leq O(\frac{C}{T^{1/4}})$ rate of SGD in the ideal case of $\|\nabla f(x)\|_1=\varTheta(\sqrt{d}\|\nabla f(x)\|_2)$.
- [446] arXiv:2402.00390 (cross-list from cs.IR) [ pdf , ps , other ]
-
Title: EASRec: Elastic Architecture Search for Efficient Long-term Sequential Recommender SystemsSheng Zhang , Maolin Wang , Yao Zhao , Chenyi Zhuang , Jinjie Gu , Ruocheng Guo , Xiangyu Zhao , Zijian Zhang , Hongzhi YinSubjects: Information Retrieval (cs.IR) ; Artificial Intelligence (cs.AI)
Abstract: In this age where data is abundant, the ability to distill meaningful insights from the sea of information is essential. Our research addresses the computational and resource inefficiencies that current Sequential Recommender Systems (SRSs) suffer from. especially those employing attention-based models like SASRec, These systems are designed for next-item recommendations in various applications, from e-commerce to social networks. However, such systems suffer from substantial computational costs and resource consumption during the inference stage. To tackle these issues, our research proposes a novel method that combines automatic pruning techniques with advanced model architectures. We also explore the potential of resource-constrained Neural Architecture Search (NAS), a technique prevalent in the realm of recommendation systems, to fine-tune models for reduced FLOPs, latency, and energy usage while retaining or even enhancing accuracy. The main contribution of our work is developing the Elastic Architecture Search for Efficient Long-term Sequential Recommender Systems (EASRec). This approach aims to find optimal compact architectures for attention-based SRSs, ensuring accuracy retention. EASRec introduces data-aware gates that leverage historical information from input data batch to improve the performance of the recommendation network. Additionally, it utilizes a dynamic resource constraint approach, which standardizes the search process and results in more appropriate architectures. The effectiveness of our methodology is validated through exhaustive experiments on three benchmark datasets, which demonstrates EASRec's superiority in SRSs. Our research set a new standard for future exploration into efficient and accurate recommender systems, signifying a substantial advancement within this swiftly advancing field.
- [447] arXiv:2402.00393 (cross-list from cs.RO) [ pdf , ps , other ]
-
Title: Loss Function Considering Dead Zone for Neural NetworksComments: 6 pages, 6 figures, Accepted at AMC2024Subjects: Robotics (cs.RO) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: It is important to reveal the inverse dynamics of manipulators to improve control performance of model-based control. Neural networks (NNs) are promising techniques to represent complicated inverse dynamics while they require a large amount of motion data. However, motion data in dead zones of actuators is not suitable for training models decreasing the number of useful training data. In this study, based on the fact that the manipulator joint does not work irrespective of input torque in dead zones, we propose a new loss function that considers only errors of joints not in dead zones. The proposed method enables to increase in the amount of motion data available for training and the accuracy of the inverse dynamics computation. Experiments on actual equipment using a three-degree-of-freedom (DOF) manipulator showed higher accuracy than conventional methods. We also confirmed and discussed the behavior of the model of the proposed method in dead zones.
- [448] arXiv:2402.00396 (cross-list from cs.LG) [ pdf , ps , other ]
-
Title: Efficient Exploration for LLMsSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Methodology (stat.ME); Machine Learning (stat.ML)
Abstract: We present evidence of substantial benefit from efficient exploration in gathering human feedback to improve large language models. In our experiments, an agent sequentially generates queries while fitting a reward model to the feedback received. Our best-performing agent generates queries using double Thompson sampling, with uncertainty represented by an epistemic neural network. Our results demonstrate that efficient exploration enables high levels of performance with far fewer queries. Further, both uncertainty estimation and the choice of exploration scheme play critical roles.
- [449] arXiv:2402.00397 (cross-list from cs.LG) [ pdf , ps , html , other ]
-
Title: Multi-scale Traffic Pattern Bank for Cross-city Few-shot Traffic ForecastingComments: Under review. Text overlap with arXiv:2308.09727Subjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Abstract: Traffic forecasting is crucial for intelligent transportation systems (ITS), aiding in efficient resource allocation and effective traffic control. However, its effectiveness often relies heavily on abundant traffic data, while many cities lack sufficient data due to limited device support, posing a significant challenge for traffic forecasting. Recognizing this challenge, we have made a noteworthy observation: traffic patterns exhibit similarities across diverse cities. Building on this key insight, we propose a solution for the cross-city few-shot traffic forecasting problem called Multi-scale Traffic Pattern Bank (MTPB). Primarily, MTPB initiates its learning process by leveraging data-rich source cities, effectively acquiring comprehensive traffic knowledge through a spatial-temporal-aware pre-training process. Subsequently, the framework employs advanced clustering techniques to systematically generate a multi-scale traffic pattern bank derived from the learned knowledge. Next, the traffic data of the data-scarce target city could query the traffic pattern bank, facilitating the aggregation of meta-knowledge. This meta-knowledge, in turn, assumes a pivotal role as a robust guide in subsequent processes involving graph reconstruction and forecasting. Empirical assessments conducted on real-world traffic datasets affirm the superior performance of MTPB, surpassing existing methods across various categories and exhibiting numerous attributes conducive to the advancement of cross-city few-shot forecasting methodologies. The code is available in this https URL .
- [450] arXiv:2402.00402 (cross-list from cs.CL) [ pdf , ps , other ]
-
Title: Investigating Bias Representations in Llama 2 Chat via Activation SteeringSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI)
Abstract: We address the challenge of societal bias in Large Language Models (LLMs), focusing on the Llama 2 7B Chat model. As LLMs are increasingly integrated into decision-making processes with substantial societal impact, it becomes imperative to ensure these models do not reinforce existing biases. Our approach employs activation steering to probe for and mitigate biases related to gender, race, and religion. This method manipulates model activations to direct responses towards or away from biased outputs, utilizing steering vectors derived from the StereoSet dataset and custom GPT4 generated gender bias prompts. Our findings reveal inherent gender bias in Llama 2 7B Chat, persisting even after Reinforcement Learning from Human Feedback (RLHF). We also observe a predictable negative correlation between bias and the model's tendency to refuse responses. Significantly, our study uncovers that RLHF tends to increase the similarity in the model's representation of different forms of societal biases, which raises questions about the model's nuanced understanding of different forms of bias. This work also provides valuable insights into effective red-teaming strategies for LLMs using activation steering, particularly emphasizing the importance of integrating a refusal vector.
- [451] arXiv:2402.00411 (cross-list from cs.NE) [ pdf , ps , html , other ]
-
Title: LM-HT SNN: Enhancing the Performance of SNN to ANN Counterpart through Learnable Multi-hierarchical Threshold ModelComments: 15 pages, 2 figuresSubjects: Neural and Evolutionary Computing (cs.NE) ; Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
Abstract: Compared to traditional Artificial Neural Network (ANN), Spiking Neural Network (SNN) has garnered widespread academic interest for its intrinsic ability to transmit information in a more biological-inspired and energy-efficient manner. However, despite previous efforts to optimize the learning gradients and model structure of SNNs through various methods, SNNs still lag behind ANNs in terms of performance to some extent. The recently proposed multi-threshold model provides more possibilities for further enhancing the learning capability of SNNs. In this paper, we rigorously analyze the relationship among the multi-threshold model, vanilla spiking model and quantized ANNs from a mathematical perspective, then propose a novel LM-HT model, which is an equidistant multi-hierarchical model that can dynamically regulate the global input current and membrane potential leakage on the time dimension. In addition, we note that the direct training algorithm based on the LM-HT model can seamlessly integrate with the traditional ANN-SNN Conversion framework. This novel hybrid learning framework can effectively improve the relatively poor performance of converted SNNs under low time latency. Extensive experimental results have demonstrated that our LM-HT model can significantly outperform previous state-of-the-art works on various types of datasets, which promote SNNs to achieve a brand-new level of performance comparable to quantized ANNs.
- [452] arXiv:2402.00412 (cross-list from cs.CL) [ pdf , ps , other ]
-
Title: Hidding the Ghostwriters: An Adversarial Evaluation of AI-Generated Student Essay DetectionComments: Accepted by EMNLP 2023 Main conference, Oral PresentationSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI)
Abstract: Large language models (LLMs) have exhibited remarkable capabilities in text generation tasks. However, the utilization of these models carries inherent risks, including but not limited to plagiarism, the dissemination of fake news, and issues in educational exercises. Although several detectors have been proposed to address these concerns, their effectiveness against adversarial perturbations, specifically in the context of student essay writing, remains largely unexplored. This paper aims to bridge this gap by constructing AIG-ASAP, an AI-generated student essay dataset, employing a range of text perturbation methods that are expected to generate high-quality essays while evading detection. Through empirical experiments, we assess the performance of current AIGC detectors on the AIG-ASAP dataset. The results reveal that the existing detectors can be easily circumvented using straightforward automatic adversarial attacks. Specifically, we explore word substitution and sentence substitution perturbation methods that effectively evade detection while maintaining the quality of the generated essays. This highlights the urgent need for more accurate and robust methods to detect AI-generated student essays in the education domain.
- [453] arXiv:2402.00414 (cross-list from cs.CL) [ pdf , ps , html , other ]
-
Title: Prompt-Time Symbolic Knowledge Capture with Large Language ModelsTolga Çöplü , Arto Bendiken , Andrii Skomorokhov , Eduard Bateiko , Stephen Cobb , Joshua J. Bouw (Haltia, Inc.)Comments: 8 pages, 5 figures, 1 table preprint. Under reviewSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI)
Abstract: Augmenting large language models (LLMs) with user-specific knowledge is crucial for real-world applications, such as personal AI assistants. However, LLMs inherently lack mechanisms for prompt-driven knowledge capture. This paper investigates utilizing the existing LLM capabilities to enable prompt-driven knowledge capture, with a particular emphasis on knowledge graphs. We address this challenge by focusing on prompt-to-triple (P2T) generation. We explore three methods: zero-shot prompting, few-shot prompting, and fine-tuning, and then assess their performance via a specialized synthetic dataset. Our code and datasets are publicly available at this https URL .
- [454] arXiv:2402.00435 (cross-list from math.NA) [ pdf , ps , other ]
-
Title: A practical existence theorem for reduced order models based on convolutional autoencodersSubjects: Numerical Analysis (math.NA) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: In recent years, deep learning has gained increasing popularity in the fields of Partial Differential Equations (PDEs) and Reduced Order Modeling (ROM), providing domain practitioners with new powerful data-driven techniques such as Physics-Informed Neural Networks (PINNs), Neural Operators, Deep Operator Networks (DeepONets) and Deep-Learning based ROMs (DL-ROMs). In this context, deep autoencoders based on Convolutional Neural Networks (CNNs) have proven extremely effective, outperforming established techniques, such as the reduced basis method, when dealing with complex nonlinear problems. However, despite the empirical success of CNN-based autoencoders, there are only a few theoretical results supporting these architectures, usually stated in the form of universal approximation theorems. In particular, although the existing literature provides users with guidelines for designing convolutional autoencoders, the subsequent challenge of learning the latent features has been barely investigated. Furthermore, many practical questions remain unanswered, e.g., the number of snapshots needed for convergence or the neural network training strategy. In this work, using recent techniques from sparse high-dimensional function approximation, we fill some of these gaps by providing a new practical existence theorem for CNN-based autoencoders when the parameter-to-solution map is holomorphic. This regularity assumption arises in many relevant classes of parametric PDEs, such as the parametric diffusion equation, for which we discuss an explicit application of our general theory.
- [455] arXiv:2402.00447 (cross-list from cs.LG) [ pdf , ps , other ]
-
Title: A Survey of Data-Efficient Graph LearningSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Social and Information Networks (cs.SI)
Abstract: Graph-structured data, prevalent in domains ranging from social networks to biochemical analysis, serve as the foundation for diverse real-world systems. While graph neural networks demonstrate proficiency in modeling this type of data, their success is often reliant on significant amounts of labeled data, posing a challenge in practical scenarios with limited annotation resources. To tackle this problem, tremendous efforts have been devoted to enhancing graph machine learning performance under low-resource settings by exploring various approaches to minimal supervision. In this paper, we introduce a novel concept of Data-Efficient Graph Learning (DEGL) as a research frontier, and present the first survey that summarizes the current progress of DEGL. We initiate by highlighting the challenges inherent in training models with large labeled data, paving the way for our exploration into DEGL. Next, we systematically review recent advances on this topic from several key aspects, including self-supervised graph learning, semi-supervised graph learning, and few-shot graph learning. Also, we state promising directions for future research, contributing to the evolution of graph machine learning.
- [456] arXiv:2402.00448 (cross-list from cs.CV) [ pdf , ps , other ]
-
Title: Dual-Student Knowledge Distillation Networks for Unsupervised Anomaly DetectionSubjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI)
Abstract: Due to the data imbalance and the diversity of defects, student-teacher networks (S-T) are favored in unsupervised anomaly detection, which explores the discrepancy in feature representation derived from the knowledge distillation process to recognize anomalies. However, vanilla S-T network is not stable. Employing identical structures to construct the S-T network may weaken the representative discrepancy on anomalies. But using different structures can increase the likelihood of divergent performance on normal data. To address this problem, we propose a novel dual-student knowledge distillation (DSKD) architecture. Different from other S-T networks, we use two student networks a single pre-trained teacher network, where the students have the same scale but inverted structures. This framework can enhance the distillation effect to improve the consistency in recognition of normal data, and simultaneously introduce diversity for anomaly representation. To explore high-dimensional semantic information to capture anomaly clues, we employ two strategies. First, a pyramid matching mode is used to perform knowledge distillation on multi-scale feature maps in the intermediate layers of networks. Second, an interaction is facilitated between the two student networks through a deep feature embedding module, which is inspired by real-world group discussions. In terms of classification, we obtain pixel-wise anomaly segmentation maps by measuring the discrepancy between the output feature maps of the teacher and student networks, from which an anomaly score is computed for sample-wise determination. We evaluate DSKD on three benchmark datasets and probe the effects of internal modules through ablation experiments. The results demonstrate that DSKD can achieve exceptional performance on small models like ResNet18 and effectively improve vanilla S-T networks.
- [457] arXiv:2402.00459 (cross-list from cs.NE) [ pdf , ps , other ]
-
Title: Genetic-based Constraint Programming for Resource Constrained Job SchedulingSubjects: Neural and Evolutionary Computing (cs.NE) ; Artificial Intelligence (cs.AI)
Abstract: Resource constrained job scheduling is a hard combinatorial optimisation problem that originates in the mining industry. Off-the-shelf solvers cannot solve this problem satisfactorily in reasonable timeframes, while other solution methods such as many evolutionary computation methods and matheuristics cannot guarantee optimality and require low-level customisation and specialised heuristics to be effective. This paper addresses this gap by proposing a genetic programming algorithm to discover efficient search strategies of constraint programming for resource-constrained job scheduling. In the proposed algorithm, evolved programs represent variable selectors to be used in the search process of constraint programming, and their fitness is determined by the quality of solutions obtained for training instances. The novelties of this algorithm are (1) a new representation of variable selectors, (2) a new fitness evaluation scheme, and (3) a pre-selection mechanism. Tests with a large set of random and benchmark instances, the evolved variable selectors can significantly improve the efficiency of constraining programming. Compared to highly customised metaheuristics and hybrid algorithms, evolved variable selectors can help constraint programming identify quality solutions faster and proving optimality is possible if sufficiently large run-times are allowed. The evolved variable selectors are especially helpful when solving instances with large numbers of machines.
- [458] arXiv:2402.00474 (cross-list from cs.CL) [ pdf , ps , other ]
-
Title: SA-MDKIF: A Scalable and Adaptable Medical Domain Knowledge Injection Framework for Large Language ModelsSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI)
Abstract: Recent advances in large language models (LLMs) have demonstrated exceptional performance in various natural language processing (NLP) tasks. However, their effective application in the medical domain is hampered by a lack of medical domain knowledge. In this study, we present SA-MDKIF, a scalable and adaptable framework that aims to inject medical knowledge into general-purpose LLMs through instruction tuning, thereby enabling adaptability for various downstream tasks. SA-MDKIF consists of two stages: skill training and skill adaptation. In the first stage, we define 12 basic medical skills and use AdaLoRA to train these skills based on uniformly formatted instructional datasets that we have constructed. In the next stage, we train the skill router using task-specific downstream data and use this router to integrate the acquired skills with LLMs during inference. Experimental results on 9 different medical tasks show that SA-MDKIF improves performance by 10-20% compared to the original LLMs. Notably, this improvement is particularly pronounced for unseen medical tasks, showing an improvement of up to 30%.
- [459] arXiv:2402.00518 (cross-list from cs.LG) [ pdf , ps , other ]
-
Title: EE-Tuning: An Economical yet Scalable Solution for Tuning Early-Exit Large Language ModelsSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
Abstract: This work introduces EE-Tuning, a lightweight and economical solution to training/tuning early-exit large language models (LLMs). In contrast to the common approach of full-parameter pre-training, EE-Tuning augments any pre-trained (and possibly fine-tuned) standard LLM with additional early-exit layers that are tuned in a parameter-efficient manner, which requires significantly less computational resources and training data. Our implementation of EE-Tuning achieves outstanding training efficiency via extensive performance optimizations, as well as scalability due to its full compatibility with 3D parallelism. Results of systematic experiments validate the efficacy of EE-Tuning, confirming that effective early-exit LLM inference can be achieved with a limited training budget. In hope of making early-exit LLMs accessible to the community, we release the source code of our implementation of EE-Tuning at this https URL .
- [460] arXiv:2402.00588 (cross-list from cs.RO) [ pdf , ps , other ]
-
Title: BrainSLAM: SLAM on Neural Population Activity DataComments: Accepted to the 23rd International Conference on Autonomous Agents and Multiagent Systems. 2024Subjects: Robotics (cs.RO) ; Artificial Intelligence (cs.AI); Multiagent Systems (cs.MA)
Abstract: Simultaneous localisation and mapping (SLAM) algorithms are commonly used in robotic systems for learning maps of novel environments. Brains also appear to learn maps, but the mechanisms are not known and it is unclear how to infer these maps from neural activity data. We present BrainSLAM; a method for performing SLAM using only population activity (local field potential, LFP) data simultaneously recorded from three brain regions in rats: hippocampus, prefrontal cortex, and parietal cortex. This system uses a convolutional neural network (CNN) to decode velocity and familiarity information from wavelet scalograms of neural local field potential data recorded from rats as they navigate a 2D maze. The CNN's output drives a RatSLAM-inspired architecture, powering an attractor network which performs path integration plus a separate system which performs `loop closure' (detecting previously visited locations and correcting map aliasing errors). Together, these three components can construct faithful representations of the environment while simultaneously tracking the animal's location. This is the first demonstration of inference of a spatial map from brain recordings. Our findings expand SLAM to a new modality, enabling a new method of mapping environments and facilitating a better understanding of the role of cognitive maps in navigation and decision making.
- [461] arXiv:2402.00607 (cross-list from cs.LG) [ pdf , ps , other ]
-
Title: Are Synthetic Time-series Data Really not as Good as Real Data?Subjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Abstract: Time-series data presents limitations stemming from data quality issues, bias and vulnerabilities, and generalization problem. Integrating universal data synthesis methods holds promise in improving generalization. However, current methods cannot guarantee that the generator's output covers all unseen real data. In this paper, we introduce InfoBoost -- a highly versatile cross-domain data synthesizing framework with time series representation learning capability. We have developed a method based on synthetic data that enables model training without the need for real data, surpassing the performance of models trained with real data. Additionally, we have trained a universal feature extractor based on our synthetic data that is applicable to all time-series data. Our approach overcomes interference from multiple sources rhythmic signal, noise interference, and long-period features that exceed sampling window capabilities. Through experiments, our non-deep-learning synthetic data enables models to achieve superior reconstruction performance and universal explicit representation extraction without the need for real data.
- [462] arXiv:2402.00627 (cross-list from cs.CV) [ pdf , ps , other ]
-
Title: CapHuman: Capture Your Moments in Parallel UniversesComments: Project page: this https URLSubjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI)
Abstract: We concentrate on a novel human-centric image synthesis task, that is, given only one reference facial photograph, it is expected to generate specific individual images with diverse head positions, poses, facial expressions, and illuminations in different contexts. To accomplish this goal, we argue that our generative model should be capable of the following favorable characteristics: (1) a strong visual and semantic understanding of our world and human society for basic object and human image generation. (2) generalizable identity preservation ability. (3) flexible and fine-grained head control. Recently, large pre-trained text-to-image diffusion models have shown remarkable results, serving as a powerful generative foundation. As a basis, we aim to unleash the above two capabilities of the pre-trained model. In this work, we present a new framework named CapHuman. We embrace the "encode then learn to align" paradigm, which enables generalizable identity preservation for new individuals without cumbersome tuning at inference. CapHuman encodes identity features and then learns to align them into the latent space. Moreover, we introduce the 3D facial prior to equip our model with control over the human head in a flexible and 3D-consistent manner. Extensive qualitative and quantitative analyses demonstrate our CapHuman can produce well-identity-preserved, photo-realistic, and high-fidelity portraits with content-rich representations and various head renditions, superior to established baselines. Code and checkpoint will be released at this https URL .
- [463] arXiv:2402.00672 (cross-list from cs.CV) [ pdf , ps , html , other ]
-
Title: Exploring Homogeneous and Heterogeneous Consistent Label Associations for Unsupervised Visible-Infrared Person ReIDSubjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI)
Abstract: Unsupervised visible-infrared person re-identification (USL-VI-ReID) aims to retrieve pedestrian images of the same identity from different modalities without annotations. While prior work focuses on establishing cross-modality pseudo-label associations to bridge the modality-gap, they ignore maintaining the instance-level homogeneous and heterogeneous consistency in pseudo-label space, resulting in coarse associations. In response, we introduce a Modality-Unified Label Transfer (MULT) module that simultaneously accounts for both homogeneous and heterogeneous fine-grained instance-level structures, yielding high-quality cross-modality label associations. It models both homogeneous and heterogeneous affinities, leveraging them to define the inconsistency for the pseudo-labels and then minimize it, leading to pseudo-labels that maintain alignment across modalities and consistency within intra-modality structures. Additionally, a straightforward plug-and-play Online Cross-memory Label Refinement (OCLR) module is proposed to further mitigate the impact of noisy pseudo-labels while simultaneously aligning different modalities, coupled with a Modality-Invariant Representation Learning (MIRL) framework. Experiments demonstrate that our proposed method outperforms existing USL-VI-ReID methods, highlighting the superiority of our MULT in comparison to other cross-modality association methods. The code will be available.
- [464] arXiv:2402.00676 (cross-list from cs.RO) [ pdf , ps , other ]
-
Title: Deep Robot Sketching: An application of Deep Q-Learning Networks for human-like sketchingJournal-ref: Cognitive Systems Research, Volume 81, September 2023, pages 57 to 63Subjects: Robotics (cs.RO) ; Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)
Abstract: The current success of Reinforcement Learning algorithms for its performance in complex environments has inspired many recent theoretical approaches to cognitive science. Artistic environments are studied within the cognitive science community as rich, natural, multi-sensory, multi-cultural environments. In this work, we propose the introduction of Reinforcement Learning for improving the control of artistic robot applications. Deep Q-learning Neural Networks (DQN) is one of the most successful algorithms for the implementation of Reinforcement Learning in robotics. DQN methods generate complex control policies for the execution of complex robot applications in a wide set of environments. Current art painting robot applications use simple control laws that limits the adaptability of the frameworks to a set of simple environments. In this work, the introduction of DQN within an art painting robot application is proposed. The goal is to study how the introduction of a complex control policy impacts the performance of a basic art painting robot application. The main expected contribution of this work is to serve as a first baseline for future works introducing DQN methods for complex art painting robot frameworks. Experiments consist of real world executions of human drawn sketches using the DQN generated policy and TEO, the humanoid robot. Results are compared in terms of similarity and obtained reward with respect to the reference inputs
- [465] arXiv:2402.00677 (cross-list from cs.RO) [ pdf , ps , other ]
-
Title: Neural Policy Style TransferJournal-ref: Cognitive Systems Research, Volume 72, March 2022, Pages 23 to 32Subjects: Robotics (cs.RO) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)
Abstract: Style Transfer has been proposed in a number of fields: fine arts, natural language processing, and fixed trajectories. We scale this concept up to control policies within a Deep Reinforcement Learning infrastructure. Each network is trained to maximize the expected reward, which typically encodes the goal of an action, and can be described as the content. The expressive power of deep neural networks enables encoding a secondary task, which can be described as the style. The Neural Policy Style Transfer (NPST) algorithm is proposed to transfer the style of one policy to another, while maintaining the content of the latter. Different policies are defined via Deep Q-Network architectures. These models are trained using demonstrations through Inverse Reinforcement Learning. Two different sets of user demonstrations are performed, one for content and other for style. Different styles are encoded as defined by user demonstrations. The generated policy is the result of feeding a content policy and a style policy to the NPST algorithm. Experiments are performed in a catch-ball game inspired by the Deep Reinforcement Learning classical Atari games; and a real-world painting scenario with a full-sized humanoid robot, based on previous works of the authors. The implementation of three different Q-Network architectures (Shallow, Deep and Deep Recurrent Q-Network) to encode the policies within the NPST framework is proposed and the results obtained in the experiments with each of these architectures compared.
- [466] arXiv:2402.00678 (cross-list from cs.RO) [ pdf , ps , other ]
-
Title: Real Evaluations Tractability using Continuous Goal-Directed Actions in Smart City ApplicationsJournal-ref: Sensors, Volume 18, Issue 11, 2018Subjects: Robotics (cs.RO) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: One of the most important challenges of Smart City Applications is to adapt the system to interact with non-expert users. Robot imitation frameworks aim to simplify and reduce times of robot programming by allowing users to program directly through demonstrations. In classical frameworks, actions are modeled using joint or Cartesian space trajectories. Other features, such as visual ones, are not always well represented with these pure geometrical approaches. Continuous Goal-Directed Actions (CGDA) is an alternative to these methods, as it encodes actions as changes of any feature that can be extracted from the environment. As a consequence of this, the robot joint trajectories for execution must be fully computed to comply with this feature-agnostic encoding. This is achieved using Evolutionary Algorithms (EA), which usually requires too many evaluations to perform this evolution step in the actual robot. Current strategies involve performing evaluations in a simulation, transferring the final joint trajectory to the actual robot. Smart City applications involve working in highly dynamic and complex environments, where having a precise model is not always achievable. Our goal is to study the tractability of performing these evaluations directly in a real-world scenario. Two different approaches to reduce the number of evaluations using EA, are proposed and compared. In the first approach, Particle Swarm Optimization (PSO)-based methods have been studied and compared within CGDA: naive PSO, Fitness Inheritance PSO (FI-PSO), and Adaptive Fuzzy Fitness Granulation with PSO (AFFG-PSO). The second approach studied the introduction of geometrical and velocity constraints within CGDA. The effects of both approaches were analyzed and compared in the wax and paint actions, two CGDA commonly studied use cases. Results from this paper depict an important reduction in the number of evaluations.
- [467] arXiv:2402.00689 (cross-list from cs.CR) [ pdf , ps , html , other ]
-
Title: Ocassionally Secure: A Comparative Analysis of Code Generation AssistantsRan Elgedawy , John Sadik , Senjuti Dutta , Anuj Gautam , Konstantinos Georgiou , Farzin Gholamrezae , Fujiao Ji , Kyungchan Lim , Qian Liu , Scott RuotiComments: 12 pages, 2 figuresSubjects: Cryptography and Security (cs.CR) ; Artificial Intelligence (cs.AI)
Abstract: $ $Large Language Models (LLMs) are being increasingly utilized in various applications, with code generations being a notable example. While previous research has shown that LLMs have the capability to generate both secure and insecure code, the literature does not take into account what factors help generate secure and effective code. Therefore in this paper we focus on identifying and understanding the conditions and contexts in which LLMs can be effectively and safely deployed in real-world scenarios to generate quality code. We conducted a comparative analysis of four advanced LLMs--GPT-3.5 and GPT-4 using ChatGPT and Bard and Gemini from Google--using 9 separate tasks to assess each model's code generation capabilities. We contextualized our study to represent the typical use cases of a real-life developer employing LLMs for everyday tasks as work. Additionally, we place an emphasis on security awareness which is represented through the use of two distinct versions of our developer persona. In total, we collected 61 code outputs and analyzed them across several aspects: functionality, security, performance, complexity, and reliability. These insights are crucial for understanding the models' capabilities and limitations, guiding future development and practical applications in the field of automated code generation.
- [468] arXiv:2402.00699 (cross-list from cs.SE) [ pdf , ps , other ]
-
Title: PeaTMOSS: A Dataset and Initial Analysis of Pre-Trained Models in Open-Source SoftwareWenxin Jiang , Jerin Yasmin , Jason Jones , Nicholas Synovic , Jiashen Kuo , Nathaniel Bielanski , Yuan Tian , George K. Thiruvathukal , James C. DavisComments: Accepted at MSR'24Subjects: Software Engineering (cs.SE) ; Artificial Intelligence (cs.AI); Databases (cs.DB); Machine Learning (cs.LG)
Abstract: The development and training of deep learning models have become increasingly costly and complex. Consequently, software engineers are adopting pre-trained models (PTMs) for their downstream applications. The dynamics of the PTM supply chain remain largely unexplored, signaling a clear need for structured datasets that document not only the metadata but also the subsequent applications of these models. Without such data, the MSR community cannot comprehensively understand the impact of PTM adoption and reuse. This paper presents the PeaTMOSS dataset, which comprises metadata for 281,638 PTMs and detailed snapshots for all PTMs with over 50 monthly downloads (14,296 PTMs), along with 28,575 open-source software repositories from GitHub that utilize these models. Additionally, the dataset includes 44,337 mappings from 15,129 downstream GitHub repositories to the 2,530 PTMs they use. To enhance the dataset's comprehensiveness, we developed prompts for a large language model to automatically extract model metadata, including the model's training datasets, parameters, and evaluation metrics. Our analysis of this dataset provides the first summary statistics for the PTM supply chain, showing the trend of PTM development and common shortcomings of PTM package documentation. Our example application reveals inconsistencies in software licenses across PTMs and their dependent projects. PeaTMOSS lays the foundation for future research, offering rich opportunities to investigate the PTM supply chain. We outline mining opportunities on PTMs, their downstream usage, and cross-cutting questions.
- [469] arXiv:2402.00707 (cross-list from cs.CL) [ pdf , ps , other ]
-
Title: Non-Exchangeable Conformal Language Generation with Nearest NeighborsSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: Quantifying uncertainty in automatically generated text is important for letting humans check potential hallucinations and making systems more reliable. Conformal prediction is an attractive framework to provide predictions imbued with statistical guarantees, however, its application to text generation is challenging since any i.i.d. assumptions are not realistic. In this paper, we bridge this gap by leveraging recent results on non-exchangeable conformal prediction, which still ensures bounds on coverage. The result, non-exchangeable conformal nucleus sampling, is a novel extension of the conformal prediction framework to generation based on nearest neighbors. Our method can be used post-hoc for an arbitrary model without extra training and supplies token-level, calibrated prediction sets equipped with statistical guarantees. Experiments in machine translation and language modeling show encouraging results in generation quality. By also producing tighter prediction sets with good coverage, we thus give a more theoretically principled way to perform sampling with conformal guarantees.
- [470] arXiv:2402.00722 (cross-list from cs.RO) [ pdf , ps , other ]
-
Title: Neural Style Transfer with Twin-Delayed DDPG for Shared Control of Robotic ManipulatorsRaul Fernandez-Fernandez , Marco Aggravi , Paolo Robuffo Giordano , Juan G. Victores , Claudio PacchierottiSubjects: Robotics (cs.RO) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)
Abstract: Neural Style Transfer (NST) refers to a class of algorithms able to manipulate an element, most often images, to adopt the appearance or style of another one. Each element is defined as a combination of Content and Style: the Content can be conceptually defined as the what and the Style as the how of said element. In this context, we propose a custom NST framework for transferring a set of styles to the motion of a robotic manipulator, e.g., the same robotic task can be carried out in an angry, happy, calm, or sad way. An autoencoder architecture extracts and defines the Content and the Style of the target robot motions. A Twin Delayed Deep Deterministic Policy Gradient (TD3) network generates the robot control policy using the loss defined by the autoencoder. The proposed Neural Policy Style Transfer TD3 (NPST3) alters the robot motion by introducing the trained style. Such an approach can be implemented either offline, for carrying out autonomous robot motions in dynamic environments, or online, for adapting at runtime the style of a teleoperated robot. The considered styles can be learned online from human demonstrations. We carried out an evaluation with human subjects enrolling 73 volunteers, asking them to recognize the style behind some representative robotic motions. Results show a good recognition rate, proving that it is possible to convey different styles to a robot using this approach.
- [471] arXiv:2402.00742 (cross-list from cs.CL) [ pdf , ps , other ]
-
Title: Transforming and Combining Rewards for Aligning Large Language ModelsZihao Wang , Chirag Nagpal , Jonathan Berant , Jacob Eisenstein , Alex D'Amour , Sanmi Koyejo , Victor VeitchSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI)
Abstract: A common approach for aligning language models to human preferences is to first learn a reward model from preference data, and then use this reward model to update the language model. We study two closely related problems that arise in this approach. First, any monotone transformation of the reward model preserves preference ranking; is there a choice that is ``better'' than others? Second, we often wish to align language models to multiple properties: how should we combine multiple reward models? Using a probabilistic interpretation of the alignment procedure, we identify a natural choice for transformation for (the common case of) rewards learned from Bradley-Terry preference models. This derived transformation has two important properties. First, it emphasizes improving poorly-performing outputs, rather than outputs that already score well. This mitigates both underfitting (where some prompts are not improved) and reward hacking (where the model learns to exploit misspecification of the reward model). Second, it enables principled aggregation of rewards by linking summation to logical conjunction: the sum of transformed rewards corresponds to the probability that the output is ``good'' in all measured properties, in a sense we make precise. Experiments aligning language models to be both helpful and harmless using RLHF show substantial improvements over the baseline (non-transformed) approach.
- [472] arXiv:2402.00751 (cross-list from cs.LG) [ pdf , ps , other ]
-
Title: Unlearnable Algorithms for In-context LearningSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Cryptography and Security (cs.CR)
Abstract: Machine unlearning is a desirable operation as models get increasingly deployed on data with unknown provenance. However, achieving exact unlearning -- obtaining a model that matches the model distribution when the data to be forgotten was never used -- is challenging or inefficient, often requiring significant retraining. In this paper, we focus on efficient unlearning methods for the task adaptation phase of a pretrained large language model (LLM). We observe that an LLM's ability to do in-context learning for task adaptation allows for efficient exact unlearning of task adaptation training data. We provide an algorithm for selecting few-shot training examples to prepend to the prompt given to an LLM (for task adaptation), ERASE, whose unlearning operation cost is independent of model and dataset size, meaning it scales to large models and datasets. We additionally compare our approach to fine-tuning approaches and discuss the trade-offs between the two approaches. This leads us to propose a new holistic measure of unlearning cost which accounts for varying inference costs, and conclude that in-context learning can often be more favourable than fine-tuning for deployments involving unlearning requests.
- [473] arXiv:2402.00759 (cross-list from cs.LG) [ pdf , ps , other ]
-
Title: Building Expressive and Tractable Probabilistic Generative Models: A ReviewSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Abstract: We present a comprehensive survey of the advancements and techniques in the field of tractable probabilistic generative modeling, primarily focusing on Probabilistic Circuits (PCs). We provide a unified perspective on the inherent trade-offs between expressivity and the tractability, highlighting the design principles and algorithmic extensions that have enabled building expressive and efficient PCs, and provide a taxonomy of the field. We also discuss recent efforts to build deep and hybrid PCs by fusing notions from deep neural models, and outline the challenges and open questions that can guide future research in this evolving field.
- [474] arXiv:2402.00789 (cross-list from cs.LG) [ pdf , ps , other ]
-
Title: Graph-Mamba: Towards Long-Range Graph Sequence Modeling with Selective State SpacesSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Abstract: Attention mechanisms have been widely used to capture long-range dependencies among nodes in Graph Transformers. Bottlenecked by the quadratic computational cost, attention mechanisms fail to scale in large graphs. Recent improvements in computational efficiency are mainly achieved by attention sparsification with random or heuristic-based graph subsampling, which falls short in data-dependent context reasoning. State space models (SSMs), such as Mamba, have gained prominence for their effectiveness and efficiency in modeling long-range dependencies in sequential data. However, adapting SSMs to non-sequential graph data presents a notable challenge. In this work, we introduce Graph-Mamba, the first attempt to enhance long-range context modeling in graph networks by integrating a Mamba block with the input-dependent node selection mechanism. Specifically, we formulate graph-centric node prioritization and permutation strategies to enhance context-aware reasoning, leading to a substantial improvement in predictive performance. Extensive experiments on ten benchmark datasets demonstrate that Graph-Mamba outperforms state-of-the-art methods in long-range graph prediction tasks, with a fraction of the computational cost in both FLOPs and GPU memory consumption. The code and models are publicly available at this https URL .
- [475] arXiv:2402.00793 (cross-list from cs.LG) [ pdf , ps , other ]
-
Title: Distinguishing the Indistinguishable: Human Expertise in Algorithmic PredictionSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Human-Computer Interaction (cs.HC)
Abstract: We introduce a novel framework for incorporating human expertise into algorithmic predictions. Our approach focuses on the use of human judgment to distinguish inputs which `look the same' to any feasible predictive algorithm. We argue that this framing clarifies the problem of human/AI collaboration in prediction tasks, as experts often have access to information -- particularly subjective information -- which is not encoded in the algorithm's training data. We use this insight to develop a set of principled algorithms for selectively incorporating human feedback only when it improves the performance of any feasible predictor. We find empirically that although algorithms often outperform their human counterparts on average, human judgment can significantly improve algorithmic predictions on specific instances (which can be identified ex-ante). In an X-ray classification task, we find that this subset constitutes nearly 30% of the patient population. Our approach provides a natural way of uncovering this heterogeneity and thus enabling effective human-AI collaboration.
- [476] arXiv:2402.00794 (cross-list from cs.CL) [ pdf , ps , html , other ]
-
Title: ReAGent: A Model-agnostic Feature Attribution Method for Generative Language ModelsComments: Accepted at AAAI24 workshop ReLMSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: Feature attribution methods (FAs), such as gradients and attention, are widely employed approaches to derive the importance of all input features to the model predictions. Existing work in natural language processing has mostly focused on developing and testing FAs for encoder-only language models (LMs) in classification tasks. However, it is unknown if it is faithful to use these FAs for decoder-only models on text generation, due to the inherent differences between model architectures and task settings respectively. Moreover, previous work has demonstrated that there is no `one-wins-all' FA across models and tasks. This makes the selection of a FA computationally expensive for large LMs since input importance derivation often requires multiple forward and backward passes including gradient computations that might be prohibitive even with access to large compute. To address these issues, we present a model-agnostic FA for generative LMs called Recursive Attribution Generator (ReAGent). Our method updates the token importance distribution in a recursive manner. For each update, we compute the difference in the probability distribution over the vocabulary for predicting the next token between using the original input and using a modified version where a part of the input is replaced with RoBERTa predictions. Our intuition is that replacing an important token in the context should have resulted in a larger change in the model's confidence in predicting the token than replacing an unimportant token. Our method can be universally applied to any generative LM without accessing internal model weights or additional training and fine-tuning, as most other FAs require. We extensively compare the faithfulness of ReAGent with seven popular FAs across six decoder-only LMs of various sizes. The results show that our method consistently provides more faithful token importance distributions.
- [477] arXiv:2402.00795 (cross-list from cs.LG) [ pdf , ps , html , other ]
-
Title: LLMs learn governing principles of dynamical systems, revealing an in-context neural scaling lawSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Abstract: Pretrained large language models (LLMs) are surprisingly effective at performing zero-shot tasks, including time-series forecasting. However, understanding the mechanisms behind such capabilities remains highly challenging due to the complexity of the models. In this paper, we study LLMs' ability to extrapolate the behavior of dynamical systems whose evolution is governed by principles of physical interest. Our results show that LLaMA 2, a language model trained primarily on texts, achieves accurate predictions of dynamical system time series without fine-tuning or prompt engineering. Moreover, the accuracy of the learned physical rules increases with the length of the input context window, revealing an in-context version of neural scaling law. Along the way, we present a flexible and efficient algorithm for extracting probability density functions of multi-digit numbers directly from LLMs.
- [478] arXiv:2402.00798 (cross-list from cs.LG) [ pdf , ps , html , other ]
-
Title: Formal-LLM: Integrating Formal Language and Natural Language for Controllable LLM-based AgentsComments: 21 pages, 6 figures; comments and suggestions are welcomeSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Formal Languages and Automata Theory (cs.FL)
Abstract: Recent advancements on Large Language Models (LLMs) enable AI Agents to automatically generate and execute multi-step plans to solve complex tasks. However, since LLM's content generation process is hardly controllable, current LLM-based agents frequently generate invalid or non-executable plans, which jeopardizes the performance of the generated plans and corrupts users' trust in LLM-based agents. In response, this paper proposes a novel ``Formal-LLM'' framework for LLM-based agents by integrating the expressiveness of natural language and the precision of formal language. Specifically, the framework allows human users to express their requirements or constraints for the planning process as an automaton. A stack-based LLM plan generation process is then conducted under the supervision of the automaton to ensure that the generated plan satisfies the constraints, making the planning process controllable. We conduct experiments on both benchmark tasks and practical real-life tasks, and our framework achieves over 50% overall performance increase, which validates the feasibility and effectiveness of employing Formal-LLM to guide the plan generation of agents, preventing the agents from generating invalid and unsuccessful plans. Further, more controllable LLM-based agents can facilitate the broader utilization of LLM in application scenarios where high validity of planning is essential. The work is open-sourced at this https URL .
- [479] arXiv:2402.00808 (cross-list from cs.RO) [ pdf , ps , other ]
-
Title: Exploring the Dynamics between Cobot's Production Rhythm, Locus of Control and Emotional State in a Collaborative Assembly ScenarioMarta Mondellini , Matteo Lavit Nicora , Pooja Prajod , Elisabeth André , Rocco Vertechy , Alessandro Antonietti , Matteo MalosioComments: Accepted to 4th IEEE International Conference on Human-Machine SystemsSubjects: Robotics (cs.RO) ; Artificial Intelligence (cs.AI); Human-Computer Interaction (cs.HC)
Abstract: In industrial scenarios, there is widespread use of collaborative robots (cobots), and growing interest is directed at evaluating and measuring the impact of some characteristics of the cobot on the human factor. In the present pilot study, the effect that the production rhythm (C1 - Slow, C2 - Fast, C3 - Adapted to the participant's pace) of a cobot has on the Experiential Locus of Control (ELoC) and the emotional state of 31 participants has been examined. The operators' performance, the degree of basic internal Locus of Control, and the attitude towards the robots were also considered. No difference was found regarding the emotional state and the ELoC in the three conditions, but considering the other psychological variables, a more complex situation emerges. Overall, results seem to indicate a need to consider the person's psychological characteristics to offer a differentiated and optimal interaction experience.
- [480] arXiv:2402.00816 (cross-list from cs.LG) [ pdf , ps , other ]
-
Title: Leveraging Approximate Model-based Shielding for Probabilistic Safety Guarantees in Continuous EnvironmentsComments: Accepted as an Extended Abstract at AAMAS 2024Subjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Abstract: Shielding is a popular technique for achieving safe reinforcement learning (RL). However, classical shielding approaches come with quite restrictive assumptions making them difficult to deploy in complex environments, particularly those with continuous state or action spaces. In this paper we extend the more versatile approximate model-based shielding (AMBS) framework to the continuous setting. In particular we use Safety Gym as our test-bed, allowing for a more direct comparison of AMBS with popular constrained RL algorithms. We also provide strong probabilistic safety guarantees for the continuous setting. In addition, we propose two novel penalty techniques that directly modify the policy gradient, which empirically provide more stable convergence in our experiments.
- [481] arXiv:2402.00822 (cross-list from cs.HC) [ pdf , ps , other ]
-
Title: WiOpen: A Robust Wi-Fi-based Open-set Gesture Recognition FrameworkSubjects: Human-Computer Interaction (cs.HC) ; Artificial Intelligence (cs.AI)
Abstract: Recent years have witnessed a growing interest in Wi-Fi-based gesture recognition. However, existing works have predominantly focused on closed-set paradigms, where all testing gestures are predefined during training. This poses a significant challenge in real-world applications, as unseen gestures might be misclassified as known classes during testing. To address this issue, we propose WiOpen, a robust Wi-Fi-based Open-Set Gesture Recognition (OSGR) framework. Implementing OSGR requires addressing challenges caused by the unique uncertainty in Wi-Fi sensing. This uncertainty, resulting from noise and domains, leads to widely scattered and irregular data distributions in collected Wi-Fi sensing data. Consequently, data ambiguity between classes and challenges in defining appropriate decision boundaries to identify unknowns arise. To tackle these challenges, WiOpen adopts a two-fold approach to eliminate uncertainty and define precise decision boundaries. Initially, it addresses uncertainty induced by noise during data preprocessing by utilizing the CSI ratio. Next, it designs the OSGR network based on an uncertainty quantification method. Throughout the learning process, this network effectively mitigates uncertainty stemming from domains. Ultimately, the network leverages relationships among samples' neighbors to dynamically define open-set decision boundaries, successfully realizing OSGR. Comprehensive experiments on publicly accessible datasets confirm WiOpen's effectiveness. Notably, WiOpen also demonstrates superiority in cross-domain tasks when compared to state-of-the-art approaches.
- [482] arXiv:2402.00823 (cross-list from cs.LG) [ pdf , ps , html , other ]
-
Title: SLIM: Skill Learning with Multiple CriticsComments: Accepted at IEEE ICRA 2024Subjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Robotics (cs.RO)
Abstract: Self-supervised skill learning aims to acquire useful behaviors that leverage the underlying dynamics of the environment. Latent variable models, based on mutual information maximization, have been successful in this task but still struggle in the context of robotic manipulation. As it requires impacting a possibly large set of degrees of freedom composing the environment, mutual information maximization fails alone in producing useful and safe manipulation behaviors. Furthermore, tackling this by augmenting skill discovery rewards with additional rewards through a naive combination might fail to produce desired behaviors. To address this limitation, we introduce SLIM, a multi-critic learning approach for skill discovery with a particular focus on robotic manipulation. Our main insight is that utilizing multiple critics in an actor-critic framework to gracefully combine multiple reward functions leads to a significant improvement in latent-variable skill discovery for robotic manipulation while overcoming possible interference occurring among rewards which hinders convergence to useful skills. Furthermore, in the context of tabletop manipulation, we demonstrate the applicability of our novel skill discovery approach to acquire safe and efficient motor primitives in a hierarchical reinforcement learning fashion and leverage them through planning, significantly surpassing baseline approaches for skill discovery.
- [483] arXiv:2402.00828 (cross-list from eess.AS) [ pdf , ps , other ]
-
Title: Efficient Fine-tuning of Audio Spectrogram Transformers via Soft Mixture of AdaptersComments: The code will be released ad: \url{ this https URL }Subjects: Audio and Speech Processing (eess.AS) ; Artificial Intelligence (cs.AI)
Abstract: Mixture of Experts (MoE) architectures have recently started burgeoning due to their ability to scale model's capacity while maintaining the computational cost affordable. Furthermore, they can be applied to both Transformers and State Space Models, the current state-of-the-art models in numerous fields. While MoE has been mostly investigated for the pre-training stage, its use in parameter-efficient transfer learning settings is under-explored. To narrow this gap, this paper attempts to demystify the use of MoE for parameter-efficient fine-tuning of Audio Spectrogram Transformers to audio and speech downstream tasks. Specifically, we propose Soft Mixture of Adapters (Soft-MoA). It exploits adapters as the experts and, leveraging the recent Soft MoE method, it relies on a soft assignment between the input tokens and experts to keep the computational time limited. Extensive experiments across 4 benchmarks demonstrate that Soft-MoA outperforms the single adapter method and performs on par with the dense MoA counterpart. We finally present ablation studies on key elements of Soft-MoA, showing for example that Soft-MoA achieves better scaling with more experts, as well as ensuring that all experts contribute to the computation of the output tokens, thus dispensing with the expert imbalance issue.
- [484] arXiv:2402.00831 (cross-list from cs.NI) [ pdf , ps , other ]
-
Title: A YANG-aided Unified Strategy for Black Hole Detection for Backbone NetworksSubjects: Networking and Internet Architecture (cs.NI) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: Despite the crucial importance of addressing Black Hole failures in Internet backbone networks, effective detection strategies in backbone networks are lacking. This is largely because previous research has been centered on Mobile Ad-hoc Networks (MANETs), which operate under entirely different dynamics, protocols, and topologies, making their findings not directly transferable to backbone networks. Furthermore, detecting Black Hole failures in backbone networks is particularly challenging. It requires a comprehensive range of network data due to the wide variety of conditions that need to be considered, making data collection and analysis far from straightforward. Addressing this gap, our study introduces a novel approach for Black Hole detection in backbone networks using specialized Yet Another Next Generation (YANG) data models with Black Hole-sensitive Metric Matrix (BHMM) analysis. This paper details our method of selecting and analyzing four YANG models relevant to Black Hole detection in ISP networks, focusing on routing protocols and ISP-specific configurations. Our BHMM approach derived from these models demonstrates a 10% improvement in detection accuracy and a 13% increase in packet delivery rate, highlighting the efficiency of our approach. Additionally, we evaluate the Machine Learning approach leveraged with BHMM analysis in two different network settings, a commercial ISP network, and a scientific research-only network topology. This evaluation also demonstrates the practical applicability of our method, yielding significantly improved prediction outcomes in both environments.
- [485] arXiv:2402.00835 (cross-list from cs.CL) [ pdf , ps , other ]
-
Title: ALISON: Fast and Effective Stylometric Authorship ObfuscationComments: 10 pages, 6 figures, 4 tables. To be published in the Proceedings of the 38th Annual AAAI Conference on Artificial Intelligence (AAAI-24)Subjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: Authorship Attribution (AA) and Authorship Obfuscation (AO) are two competing tasks of increasing importance in privacy research. Modern AA leverages an author's consistent writing style to match a text to its author using an AA classifier. AO is the corresponding adversarial task, aiming to modify a text in such a way that its semantics are preserved, yet an AA model cannot correctly infer its authorship. To address privacy concerns raised by state-of-the-art (SOTA) AA methods, new AO methods have been proposed but remain largely impractical to use due to their prohibitively slow training and obfuscation speed, often taking hours. To this challenge, we propose a practical AO method, ALISON, that (1) dramatically reduces training/obfuscation time, demonstrating more than 10x faster obfuscation than SOTA AO methods, (2) achieves better obfuscation success through attacking three transformer-based AA methods on two benchmark datasets, typically performing 15% better than competing methods, (3) does not require direct signals from a target AA classifier during obfuscation, and (4) utilizes unique stylometric features, allowing sound model interpretation for explainable obfuscation. We also demonstrate that ALISON can effectively prevent four SOTA AA methods from accurately determining the authorship of ChatGPT-generated texts, all while minimally changing the original text semantics. To ensure the reproducibility of our findings, our code and data are available at: this https URL .
- [486] arXiv:2402.00839 (cross-list from cs.CR) [ pdf , ps , html , other ]
-
Title: X-CBA: Explainability Aided CatBoosted Anomal-E for Intrusion Detection SystemSubjects: Cryptography and Security (cs.CR) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Networking and Internet Architecture (cs.NI)
Abstract: The effectiveness of Intrusion Detection Systems (IDS) is critical in an era where cyber threats are becoming increasingly complex. Machine learning (ML) and deep learning (DL) models provide an efficient and accurate solution for identifying attacks and anomalies in computer networks. However, using ML and DL models in IDS has led to a trust deficit due to their non-transparent decision-making. This transparency gap in IDS research is significant, affecting confidence and accountability. To address, this paper introduces a novel Explainable IDS approach, called X-CBA, that leverages the structural advantages of Graph Neural Networks (GNNs) to effectively process network traffic data, while also adapting a new Explainable AI (XAI) methodology. Unlike most GNN-based IDS that depend on labeled network traffic and node features, thereby overlooking critical packet-level information, our approach leverages a broader range of traffic data through network flows, including edge attributes, to improve detection capabilities and adapt to novel threats. Through empirical testing, we establish that our approach not only achieves high accuracy with 99.47% in threat detection but also advances the field by providing clear, actionable explanations of its analytical outcomes. This research also aims to bridge the current gap and facilitate the broader integration of ML/DL technologies in cybersecurity defenses by offering a local and global explainability solution that is both precise and interpretable.
- [487] arXiv:2402.00854 (cross-list from cs.LG) [ pdf , ps , html , other ]
-
Title: SymbolicAI: A framework for logic-based approaches combining generative models and solversMarius-Constantin Dinu , Claudiu Leoveanu-Condrei , Markus Holzleitner , Werner Zellinger , Sepp HochreiterComments: 39 pages, 12 figures, external resources: framework is available at this https URL and benchmark at this https URLSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Symbolic Computation (cs.SC); Software Engineering (cs.SE)
Abstract: We introduce SymbolicAI, a versatile and modular framework employing a logic-based approach to concept learning and flow management in generative processes. SymbolicAI enables the seamless integration of generative models with a diverse range of solvers by treating large language models (LLMs) as semantic parsers that execute tasks based on both natural and formal language instructions, thus bridging the gap between symbolic reasoning and generative AI. We leverage probabilistic programming principles to tackle complex tasks, and utilize differentiable and classical programming paradigms with their respective strengths. The framework introduces a set of polymorphic, compositional, and self-referential operations for data stream manipulation, aligning LLM outputs with user objectives. As a result, we can transition between the capabilities of various foundation models endowed with zero- and few-shot learning capabilities and specialized, fine-tuned models or solvers proficient in addressing specific problems. In turn, the framework facilitates the creation and evaluation of explainable computational graphs. We conclude by introducing a quality measure and its empirical score for evaluating these computational graphs, and propose a benchmark that compares various state-of-the-art LLMs across a set of complex workflows. We refer to the empirical score as the "Vector Embedding for Relational Trajectory Evaluation through Cross-similarity", or VERTEX score for short. The framework codebase and benchmark are linked below.
- [488] arXiv:2402.00861 (cross-list from cs.CL) [ pdf , ps , other ]
-
Title: Evaluating Large Language Models for Generalization and Robustness via Data CompressionSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI)
Abstract: Existing methods for evaluating large language models face challenges such as data contamination, sensitivity to prompts, and the high cost of benchmark creation. To address this, we propose a lossless data compression based evaluation approach that tests how models' predictive abilities generalize after their training cutoff. Specifically, we collect comprehensive test data spanning 83 months from 2017 to 2023 and split the data into training and testing periods according to models' training data cutoff. We measure: 1) the compression performance on the testing period as a measure of generalization on unseen data; and 2) the performance gap between the training and testing period as a measure of robustness. Our experiments test 14 representative large language models with various sizes on sources including Wikipedia, news articles, code, arXiv papers, and multi-modal data. We find that the compression rate of many models reduces significantly after their cutoff date, but models such as Mistral and Llama-2 demonstrate a good balance between performance and robustness. Results also suggest that models struggle to generalize on news and code data, but work especially well on arXiv papers. We also find the context size and tokenization implementation have a big impact of on the overall compression performance.
- [489] arXiv:2402.00874 (cross-list from cs.NI) [ pdf , ps , other ]
-
Title: dRG-MEC: Decentralized Reinforced Green Offloading for MEC-enabled Cloud NetworkSubjects: Networking and Internet Architecture (cs.NI) ; Artificial Intelligence (cs.AI); Signal Processing (eess.SP)
Abstract: Multi-access-Mobile Edge Computing (MEC) is a promising solution for computationally demanding rigorous applications, that can meet 6G network service requirements. However, edge servers incur high computation costs during task processing. In this paper, we proposed a technique to minimize the total computation and communication overhead for optimal resource utilization with joint computational offloading that enables a green environment. Our optimization problem is NP-hard; thus, we proposed a decentralized Reinforcement Learning (dRL) approach where we eliminate the problem of dimensionality and over-estimation of the value functions. Compared to baseline schemes our technique achieves a 37.03% reduction in total system costs.
- [490] arXiv:2402.00876 (cross-list from cs.NI) [ pdf , ps , other ]
-
Title: Building Blocks to Empower Cognitive Internet with Hybrid Edge CloudSubjects: Networking and Internet Architecture (cs.NI) ; Artificial Intelligence (cs.AI)
Abstract: As we transition from the mobile internet to the 'Cognitive Internet,' a significant shift occurs in how we engage with technology and intelligence. We contend that the Cognitive Internet goes beyond the Cognitive Internet of Things (Cognitive IoT), enabling connected objects to independently acquire knowledge and understanding. Unlike the Mobile Internet and Cognitive IoT, the Cognitive Internet integrates collaborative intelligence throughout the network, blending the cognitive IoT realm with system-wide collaboration and human intelligence. This integrated intelligence facilitates interactions between devices, services, entities, and individuals across diverse domains while preserving decision-making autonomy and accommodating various identities.
The paper delves into the foundational elements, distinct characteristics, benefits, and industrial impact of the 'Cognitive Internet' paradigm. It highlights the importance of adaptable AI infrastructures and hybrid edge cloud (HEC) platforms in enabling this shift. This evolution brings forth cognitive services, a Knowledge as a Service (KaaS) economy, enhanced decision-making autonomy, sustainable digital progress, advancements in data management, processing techniques, and a stronger emphasis on privacy. In essence, this paper serves as a crucial resource for understanding and leveraging the transformative potential of HEC for Cognitive Internet. Supported by case studies, forward-looking perspectives, and real-world applications, it provides comprehensive insights into this emerging paradigm. - [491] arXiv:2402.00881 (cross-list from cs.NI) [ pdf , ps , html , other ]
-
Title: On the Interplay of Artificial Intelligence and Space-Air-Ground Integrated Networks: A SurveySubjects: Networking and Internet Architecture (cs.NI) ; Artificial Intelligence (cs.AI)
Abstract: Space-Air-Ground Integrated Networks (SAGINs), which incorporate space and aerial networks with terrestrial wireless systems, are vital enablers of the emerging sixth-generation (6G) wireless networks. Besides bringing significant benefits to various applications and services, SAGINs are envisioned to extend high-speed broadband coverage to remote areas, such as small towns or mining sites, or areas where terrestrial infrastructure cannot reach, such as airplanes or maritime use cases. However, due to the limited power and storage resources, as well as other constraints introduced by the design of terrestrial networks, SAGINs must be intelligently configured and controlled to satisfy the envisioned requirements. Meanwhile, Artificial Intelligence (AI) is another critical enabler of 6G. Due to massive amounts of available data, AI has been leveraged to address pressing challenges of current and future wireless networks. By adding AI and facilitating the decision-making and prediction procedures, SAGINs can effectively adapt to their surrounding environment, thus enhancing the performance of various metrics. In this work, we aim to investigate the interplay of AI and SAGINs by providing a holistic overview of state-of-the-art research in AI-enabled SAGINs. Specifically, we present a comprehensive overview of some potential applications of AI in SAGINs. We also cover open issues in employing AI and detail the contributions of SAGINs in the development of AI. Finally, we highlight some limitations of the existing research works and outline potential future research directions.
- [492] arXiv:2402.00888 (cross-list from cs.CL) [ pdf , ps , html , other ]
-
Title: Security and Privacy Challenges of Large Language Models: A SurveySubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI); Cryptography and Security (cs.CR)
Abstract: Large Language Models (LLMs) have demonstrated extraordinary capabilities and contributed to multiple fields, such as generating and summarizing text, language translation, and question-answering. Nowadays, LLM is becoming a very popular tool in computerized language processing tasks, with the capability to analyze complicated linguistic patterns and provide relevant and appropriate responses depending on the context. While offering significant advantages, these models are also vulnerable to security and privacy attacks, such as jailbreaking attacks, data poisoning attacks, and Personally Identifiable Information (PII) leakage attacks. This survey provides a thorough review of the security and privacy challenges of LLMs for both training data and users, along with the application-based risks in various domains, such as transportation, education, and healthcare. We assess the extent of LLM vulnerabilities, investigate emerging security and privacy attacks for LLMs, and review the potential defense mechanisms. Additionally, the survey outlines existing research gaps in this domain and highlights future research directions.
- [493] arXiv:2402.00891 (cross-list from cs.CR) [ pdf , ps , html , other ]
-
Title: Large Language Models in Cybersecurity: State-of-the-ArtFarzad Nourmohammadzadeh Motlagh , Mehrdad Hajizadeh , Mehryar Majd , Pejman Najafi , Feng Cheng , Christoph MeinelSubjects: Cryptography and Security (cs.CR) ; Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG)
Abstract: The rise of Large Language Models (LLMs) has revolutionized our comprehension of intelligence bringing us closer to Artificial Intelligence. Since their introduction, researchers have actively explored the applications of LLMs across diverse fields, significantly elevating capabilities. Cybersecurity, traditionally resistant to data-driven solutions and slow to embrace machine learning, stands out as a domain. This study examines the existing literature, providing a thorough characterization of both defensive and adversarial applications of LLMs within the realm of cybersecurity. Our review not only surveys and categorizes the current landscape but also identifies critical research gaps. By evaluating both offensive and defensive applications, we aim to provide a holistic understanding of the potential risks and opportunities associated with LLM-driven cybersecurity.
- [494] arXiv:2402.00892 (cross-list from cs.SD) [ pdf , ps , html , other ]
-
Title: EVA-GAN: Enhanced Various Audio Generation via Scalable Generative Adversarial NetworksSubjects: Sound (cs.SD) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
Abstract: The advent of Large Models marks a new era in machine learning, significantly outperforming smaller models by leveraging vast datasets to capture and synthesize complex patterns. Despite these advancements, the exploration into scaling, especially in the audio generation domain, remains limited, with previous efforts didn't extend into the high-fidelity (HiFi) 44.1kHz domain and suffering from both spectral discontinuities and blurriness in the high-frequency domain, alongside a lack of robustness against out-of-domain data. These limitations restrict the applicability of models to diverse use cases, including music and singing generation. Our work introduces Enhanced Various Audio Generation via Scalable Generative Adversarial Networks (EVA-GAN), yields significant improvements over previous state-of-the-art in spectral and high-frequency reconstruction and robustness in out-of-domain data performance, enabling the generation of HiFi audios by employing an extensive dataset of 36,000 hours of 44.1kHz audio, a context-aware module, a Human-In-The-Loop artifact measurement toolkit, and expands the model to approximately 200 million parameters. Demonstrations of our work are available at this https URL .
- [495] arXiv:2402.00893 (cross-list from cs.LG) [ pdf , ps , html , other ]
-
Title: MoDE: A Mixture-of-Experts Model with Mutual Distillation among the ExpertsComments: Accepted by AAAI-24Subjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Abstract: The application of mixture-of-experts (MoE) is gaining popularity due to its ability to improve model's performance. In an MoE structure, the gate layer plays a significant role in distinguishing and routing input features to different experts. This enables each expert to specialize in processing their corresponding sub-tasks. However, the gate's routing mechanism also gives rise to narrow vision: the individual MoE's expert fails to use more samples in learning the allocated sub-task, which in turn limits the MoE to further improve its generalization ability. To effectively address this, we propose a method called Mixture-of-Distilled-Expert (MoDE), which applies moderate mutual distillation among experts to enable each expert to pick up more features learned by other experts and gain more accurate perceptions on their original allocated sub-tasks. We conduct plenty experiments including tabular, NLP and CV datasets, which shows MoDE's effectiveness, universality and robustness. Furthermore, we develop a parallel study through innovatively constructing "expert probing", to experimentally prove why MoDE works: moderate distilling knowledge can improve each individual expert's test performances on their assigned tasks, leading to MoE's overall performance improvement.
- [496] arXiv:2402.00896 (cross-list from cs.CR) [ pdf , ps , html , other ]
-
Title: Privacy and Security Implications of Cloud-Based AI Services : A SurveySubjects: Cryptography and Security (cs.CR) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: This paper details the privacy and security landscape in today's cloud ecosystem and identifies that there is a gap in addressing the risks introduced by machine learning models. As machine learning algorithms continue to evolve and find applications across diverse domains, the need to categorize and quantify privacy and security risks becomes increasingly critical. With the emerging trend of AI-as-a-Service (AIaaS), machine learned AI models (or ML models) are deployed on the cloud by model providers and used by model consumers. We first survey the AIaaS landscape to document the various kinds of liabilities that ML models, especially Deep Neural Networks pose and then introduce a taxonomy to bridge this gap by holistically examining the risks that creators and consumers of ML models are exposed to and their known defences till date. Such a structured approach will be beneficial for ML model providers to create robust solutions. Likewise, ML model consumers will find it valuable to evaluate such solutions and understand the implications of their engagement with such services. The proposed taxonomies provide a foundational basis for solutions in private, secure and robust ML, paving the way for more transparent and resilient AI systems.
- [497] arXiv:2402.00899 (cross-list from cs.LG) [ pdf , ps , other ]
-
Title: Weakly Supervised Learners for Correction of AI Errors with Provable Performance GuaranteesIvan Y. Tyukin , Tatiana Tyukina , Daniel van Helden , Zedong Zheng , Evgeny M. Mirkes , Oliver J. Sutton , Qinghua Zhou , Alexander N. Gorban , Penelope AllisonSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
Abstract: We present a new methodology for handling AI errors by introducing weakly supervised AI error correctors with a priori performance guarantees. These AI correctors are auxiliary maps whose role is to moderate the decisions of some previously constructed underlying classifier by either approving or rejecting its decisions. The rejection of a decision can be used as a signal to suggest abstaining from making a decision. A key technical focus of the work is in providing performance guarantees for these new AI correctors through bounds on the probabilities of incorrect decisions. These bounds are distribution agnostic and do not rely on assumptions on the data dimension. Our empirical example illustrates how the framework can be applied to improve the performance of an image classifier in a challenging real-world task where training data are scarce.
- [498] arXiv:2402.00904 (cross-list from cs.LG) [ pdf , ps , html , other ]
-
Title: Graph Domain Adaptation: Challenges, Progress and ProspectsSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Abstract: As graph representation learning often suffers from label scarcity problems in real-world applications, researchers have proposed graph domain adaptation (GDA) as an effective knowledge-transfer paradigm across graphs. In particular, to enhance model performance on target graphs with specific tasks, GDA introduces a bunch of task-related graphs as source graphs and adapts the knowledge learnt from source graphs to the target graphs. Since GDA combines the advantages of graph representation learning and domain adaptation, it has become a promising direction of transfer learning on graphs and has attracted an increasing amount of research interest in recent years. In this paper, we comprehensively overview the studies of GDA and present a detailed survey of recent advances. Specifically, we outline the research status and challenges, propose a taxonomy, introduce the details of representative works, and discuss the prospects. To the best of our knowledge, this paper is the first survey for graph domain adaptation. A detailed paper list is available at this https URL .
- [499] arXiv:2402.00910 (cross-list from cs.LG) [ pdf , ps , html , other ]
-
Title: Addressing Bias Through Ensemble Learning and Regularized Fine-TuningSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Abstract: Addressing biases in AI models is crucial for ensuring fair and accurate predictions. However, obtaining large, unbiased datasets for training can be challenging. This paper proposes a comprehensive approach using multiple methods to remove bias in AI models, with only a small dataset and a potentially biased pretrained model. We train multiple models with the counter-bias of the pre-trained model through data splitting, local training, and regularized fine-tuning, gaining potentially counter-biased models. Then, we employ ensemble learning for all models to reach unbiased predictions. To further accelerate the inference time of our ensemble model, we conclude our solution with knowledge distillation that results in a single unbiased neural network. We demonstrate the effectiveness of our approach through experiments on the CIFAR10 and HAM10000 datasets, showcasing promising results. This work contributes to the ongoing effort to create more unbiased and reliable AI models, even with limited data availability.
- [500] arXiv:2402.00912 (cross-list from cs.LG) [ pdf , ps , html , other ]
-
Title: Can we Constrain Concept Bottleneck Models to Learn Semantically Meaningful Input Features?Comments: Main paper: 7 pages, 8 figures, Appendix: 15 pages, 22 figures. This paper is a preprintSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
Abstract: Concept Bottleneck Models (CBMs) are considered inherently interpretable because they first predict a set of human-defined concepts before using these concepts to predict the output of a downstream task. For inherent interpretability to be fully realised, and ensure trust in a model's output, we need to guarantee concepts are predicted based on semantically mapped input features. For example, one might expect the pixels representing a broken bone in an image to be used for the prediction of a fracture. However, current literature indicates this is not the case, as concept predictions are often mapped to irrelevant input features. We hypothesise that this occurs when concept annotations are inaccurate or how input features should relate to concepts is unclear. In general, the effect of dataset labelling on concept representations in CBMs remains an understudied area. Therefore, in this paper, we examine how CBMs learn concepts from datasets with fine-grained concept annotations. We demonstrate that CBMs can learn concept representations with semantic mapping to input features by removing problematic concept correlations, such as two concepts always appearing together. To support our evaluation, we introduce a new synthetic image dataset based on a playing cards domain, which we hope will serve as a benchmark for future CBM research. For validation, we provide empirical evidence on a real-world dataset of chest X-rays, to demonstrate semantically meaningful concepts can be learned in real-world applications.
- [501] arXiv:2402.00913 (cross-list from cs.CR) [ pdf , ps , html , other ]
-
Title: Institutional Platform for Secure Self-Service Large Language Model ExplorationV. K. Cody Bumgardner , Mitchell A. Klusty , W. Vaiden Logan , Samuel E. Armstrong , Caylin Hickey , Jeff TalbertComments: 10 pages 11 figures, 5 listings, 4 tablesSubjects: Cryptography and Security (cs.CR) ; Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
Abstract: This paper introduces a user-friendly platform developed by the University of Kentucky Center for Applied AI, designed to make large, customized language models (LLMs) more accessible. By capitalizing on recent advancements in multi-LoRA inference, the system efficiently accommodates custom adapters for a diverse range of users and projects. The paper outlines the system's architecture and key features, encompassing dataset curation, model training, secure inference, and text-based feature extraction.
We illustrate the establishment of a tenant-aware computational network using agent-based methods, securely utilizing islands of isolated resources as a unified system. The platform strives to deliver secure LLM services, emphasizing process and data isolation, end-to-end encryption, and role-based resource authentication. This contribution aligns with the overarching goal of enabling simplified access to cutting-edge AI models and technology in support of scientific discovery. - [502] arXiv:2402.00918 (cross-list from cs.CV) [ pdf , ps , html , other ]
-
Title: MUSTAN: Multi-scale Temporal Context as Attention for Robust Video Foreground SegmentationComments: 10 pages, 8 figuresSubjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI)
Abstract: Video foreground segmentation (VFS) is an important computer vision task wherein one aims to segment the objects under motion from the background. Most of the current methods are image-based, i.e., rely only on spatial cues while ignoring motion cues. Therefore, they tend to overfit the training data and don't generalize well to out-of-domain (OOD) distribution. To solve the above problem, prior works exploited several cues such as optical flow, background subtraction mask, etc. However, having a video data with annotations like optical flow is a challenging task. In this paper, we utilize the temporal information and the spatial cues from the video data to improve OOD performance. However, the challenge lies in how we model the temporal information given the video data in an interpretable way creates a very noticeable difference. We therefore devise a strategy that integrates the temporal context of the video in the development of VFS. Our approach give rise to deep learning architectures, namely MUSTAN1 and MUSTAN2 and they are based on the idea of multi-scale temporal context as an attention, i.e., aids our models to learn better representations that are beneficial for VFS. Further, we introduce a new video dataset, namely Indoor Surveillance Dataset (ISD) for VFS. It has multiple annotations on a frame level such as foreground binary mask, depth map, and instance semantic annotations. Therefore, ISD can benefit other computer vision tasks. We validate the efficacy of our architectures and compare the performance with baselines. We demonstrate that proposed methods significantly outperform the benchmark methods on OOD. In addition, the performance of MUSTAN2 is significantly improved on certain video categories on OOD data due to ISD.
- [503] arXiv:2402.00944 (cross-list from hep-th) [ pdf , ps , html , other ]
-
Title: NCoder -- A Quantum Field Theory approach to encoding dataSubjects: High Energy Physics - Theory (hep-th) ; Disordered Systems and Neural Networks (cond-mat.dis-nn); Artificial Intelligence (cs.AI)
Abstract: In this paper we present a novel approach to interpretable AI inspired by Quantum Field Theory (QFT) which we call the NCoder. The NCoder is a modified autoencoder neural network whose latent layer is prescribed to be a subset of $n$-point correlation functions. Regarding images as draws from a lattice field theory, this architecture mimics the task of perturbatively constructing the effective action of the theory order by order in an expansion using Feynman diagrams. Alternatively, the NCoder may be regarded as simulating the procedure of statistical inference whereby high dimensional data is first summarized in terms of several lower dimensional summary statistics (here the $n$-point correlation functions), and subsequent out-of-sample data is generated by inferring the data generating distribution from these statistics. In this way the NCoder suggests a fascinating correspondence between perturbative renormalizability and the sufficiency of models. We demonstrate the efficacy of the NCoder by applying it to the generation of MNIST images, and find that generated images can be correctly classified using only information from the first three $n$-point functions of the image distribution.
- [504] arXiv:2402.00957 (cross-list from cs.LG) [ pdf , ps , html , other ]
-
Title: Credal Learning TheoryComments: 19 pages, 2 figuresSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
Abstract: Statistical learning theory is the foundation of machine learning, providing theoretical bounds for the risk of models learnt from a (single) training set, assumed to issue from an unknown probability distribution. In actual deployment, however, the data distribution may (and often does) vary, causing domain adaptation/generalization issues. In this paper we lay the foundations for a `credal' theory of learning, using convex sets of probabilities (credal sets) to model the variability in the data-generating distribution. Such credal sets, we argue, may be inferred from a finite sample of training sets. Bounds are derived for the case of finite hypotheses spaces (both assuming realizability or not) as well as infinite model spaces, which directly generalize classical results.
- [505] arXiv:2402.00969 (cross-list from cs.CL) [ pdf , ps , html , other ]
-
Title: SPARQL Generation with Entity Pre-trained GPT for KG Question AnsweringComments: 7 pages, 1 figure, 2 tables. For the implementation, see this https URLSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI); Databases (cs.DB); Information Retrieval (cs.IR)
Abstract: Knowledge Graphs popularity has been rapidly growing in last years. All that knowledge is available for people to query it through the many online databases on the internet. Though, it would be a great achievement if non-programmer users could access whatever information they want to know. There has been a lot of effort oriented to solve this task using natural language processing tools and creativity encouragement by way of many challenges. Our approach focuses on assuming a correct entity linking on the natural language questions and training a GPT model to create SPARQL queries from them. We managed to isolate which property of the task can be the most difficult to solve at few or zero-shot and we proposed pre-training on all entities (under CWA) to improve the performance. We obtained a 62.703% accuracy of exact SPARQL matches on testing at 3-shots, a F1 of 0.809 on the entity linking challenge and a F1 of 0.009 on the question answering challenge.
- [506] arXiv:2402.00976 (cross-list from cs.LG) [ pdf , ps , html , other ]
-
Title: Investigating Recurrent Transformers with Dynamic HaltSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Neural and Evolutionary Computing (cs.NE)
Abstract: In this paper, we study the inductive biases of two major approaches to augmenting Transformers with a recurrent mechanism - (1) the approach of incorporating a depth-wise recurrence similar to Universal Transformers; and (2) the approach of incorporating a chunk-wise temporal recurrence like Temporal Latent Bottleneck. Furthermore, we propose and investigate novel ways to extend and combine the above methods - for example, we propose a global mean-based dynamic halting mechanism for Universal Transformer and an augmentation of Temporal Latent Bottleneck with elements from Universal Transformer. We compare the models and probe their inductive biases in several diagnostic tasks such as Long Range Arena (LRA), flip-flop language modeling, ListOps, and Logical Inference.
- [507] arXiv:2402.00978 (cross-list from cs.CL) [ pdf , ps , html , other ]
-
Title: An Information-Theoretic Approach to Analyze NLP Classification TasksComments: 21 pages, 10 figures, 11 tablesSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI); Information Theory (cs.IT)
Abstract: Understanding the importance of the inputs on the output is useful across many tasks. This work provides an information-theoretic framework to analyse the influence of inputs for text classification tasks. Natural language processing (NLP) tasks take either a single element input or multiple element inputs to predict an output variable, where an element is a block of text. Each text element has two components: an associated semantic meaning and a linguistic realization. Multiple-choice reading comprehension (MCRC) and sentiment classification (SC) are selected to showcase the framework. For MCRC, it is found that the context influence on the output compared to the question influence reduces on more challenging datasets. In particular, more challenging contexts allow a greater variation in complexity of questions. Hence, test creators need to carefully consider the choice of the context when designing multiple-choice questions for assessment. For SC, it is found the semantic meaning of the input text dominates (above 80\% for all datasets considered) compared to its linguistic realisation when determining the sentiment. The framework is made available at: this https URL
- [508] arXiv:2402.01002 (cross-list from cs.CV) [ pdf , ps , other ]
-
Title: AI-generated faces influence gender stereotypes and racial homogenizationComments: 47 pages, 19 figuresSubjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI)
Abstract: Text-to-image generative AI models such as Stable Diffusion are used daily by millions worldwide. However, the extent to which these models exhibit racial and gender stereotypes is not yet fully understood. Here, we document significant biases in Stable Diffusion across six races, two genders, 32 professions, and eight attributes. Additionally, we examine the degree to which Stable Diffusion depicts individuals of the same race as being similar to one another. This analysis reveals significant racial homogenization, e.g., depicting nearly all middle eastern men as dark-skinned, bearded, and wearing a traditional headdress. We then propose novel debiasing solutions that address the above stereotypes. Finally, using a preregistered experiment, we show that being presented with inclusive AI-generated faces reduces people's racial and gender biases, while being presented with non-inclusive ones increases such biases. This persists regardless of whether the images are labeled as AI-generated. Taken together, our findings emphasize the need to address biases and stereotypes in AI-generated content.
- [509] arXiv:2402.01018 (cross-list from cs.CL) [ pdf , ps , html , other ]
-
Title: HR-MultiWOZ: A Task Oriented Dialogue (TOD) Dataset for HR LLM AgentWeijie Xu , Zicheng Huang , Wenxiang Hu , Xi Fang , Rajesh Kumar Cherukuri , Naumaan Nayyar , Lorenzo Malandri , Srinivasan H. SengameduComments: 13 pages, 9 figuresJournal-ref: EACL 2024Subjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI)
Abstract: Recent advancements in Large Language Models (LLMs) have been reshaping Natural Language Processing (NLP) task in several domains. Their use in the field of Human Resources (HR) has still room for expansions and could be beneficial for several time consuming tasks. Examples such as time-off submissions, medical claims filing, and access requests are noteworthy, but they are by no means the sole instances. However, the aforementioned developments must grapple with the pivotal challenge of constructing a high-quality training dataset. On one hand, most conversation datasets are solving problems for customers not employees. On the other hand, gathering conversations with HR could raise privacy concerns. To solve it, we introduce HR-Multiwoz, a fully-labeled dataset of 550 conversations spanning 10 HR domains to evaluate LLM Agent. Our work has the following contributions: (1) It is the first labeled open-sourced conversation dataset in the HR domain for NLP research. (2) It provides a detailed recipe for the data generation procedure along with data analysis and human evaluations. The data generation pipeline is transferable and can be easily adapted for labeled conversation data generation in other domains. (3) The proposed data-collection pipeline is mostly based on LLMs with minimal human involvement for annotation, which is time and cost-efficient.
- [510] arXiv:2402.01020 (cross-list from cs.LO) [ pdf , ps , other ]
-
Title: Quantifying analogy of concepts via ologs and wiring diagramsComments: 30 pagesSubjects: Logic in Computer Science (cs.LO) ; Artificial Intelligence (cs.AI); Discrete Mathematics (cs.DM); Combinatorics (math.CO); Category Theory (math.CT)
Abstract: We build on the theory of ontology logs (ologs) created by Spivak and Kent, and define a notion of wiring diagrams. In this article, a wiring diagram is a finite directed labelled graph. The labels correspond to types in an olog; they can also be interpreted as readings of sensors in an autonomous system. As such, wiring diagrams can be used as a framework for an autonomous system to form abstract concepts. We show that the graphs underlying skeleton wiring diagrams form a category. This allows skeleton wiring diagrams to be compared and manipulated using techniques from both graph theory and category theory. We also extend the usual definition of graph edit distance to the case of wiring diagrams by using operations only available to wiring diagrams, leading to a metric on the set of all skeleton wiring diagrams. In the end, we give an extended example on calculating the distance between two concepts represented by wiring diagrams, and explain how to apply our framework to any application domain.
- [511] arXiv:2402.01030 (cross-list from cs.CL) [ pdf , ps , html , other ]
-
Title: Executable Code Actions Elicit Better LLM AgentsComments: Code, data, model, and demo are available at this https URLSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI)
Abstract: Large Language Model (LLM) agents, capable of performing a broad range of actions, such as invoking tools and controlling robots, show great potential in tackling real-world challenges. LLM agents are typically prompted to produce actions by generating JSON or text in a pre-defined format, which is usually limited by constrained action space (e.g., the scope of pre-defined tools) and restricted flexibility (e.g., inability to compose multiple tools). This work proposes to use executable Python code to consolidate LLM agents' actions into a unified action space (CodeAct). Integrated with a Python interpreter, CodeAct can execute code actions and dynamically revise prior actions or emit new actions upon new observations through multi-turn interactions. Our extensive analysis of 17 LLMs on API-Bank and a newly curated benchmark shows that CodeAct outperforms widely used alternatives (up to 20% higher success rate). The encouraging performance of CodeAct motivates us to build an open-source LLM agent that interacts with environments by executing interpretable code and collaborates with users using natural language. To this end, we collect an instruction-tuning dataset CodeActInstruct that consists of 7k multi-turn interactions using CodeAct. We show that it can be used with existing data to improve models in agent-oriented tasks without compromising their general capability. CodeActAgent, finetuned from Llama2 and Mistral, is integrated with Python interpreter and uniquely tailored to perform sophisticated tasks (e.g., model training) using existing libraries and autonomously self-debug.
- [512] arXiv:2402.01032 (cross-list from cs.LG) [ pdf , ps , html , other ]
-
Title: Repeat After Me: Transformers are Better than State Space Models at CopyingSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
Abstract: Transformers are the dominant architecture for sequence modeling, but there is growing interest in models that use a fixed-size latent state that does not depend on the sequence length, which we refer to as "generalized state space models" (GSSMs). In this paper we show that while GSSMs are promising in terms of inference-time efficiency, they are limited compared to transformer models on tasks that require copying from the input context. We start with a theoretical analysis of the simple task of string copying and prove that a two layer transformer can copy strings of exponential length while GSSMs are fundamentally limited by their fixed-size latent state. Empirically, we find that transformers outperform GSSMs in terms of efficiency and generalization on synthetic tasks that require copying the context. Finally, we evaluate pretrained large language models and find that transformer models dramatically outperform state space models at copying and retrieving information from context. Taken together, these results suggest a fundamental gap between transformers and GSSMs on tasks of practical interest.
- [513] arXiv:2402.01053 (cross-list from cs.CL) [ pdf , ps , html , other ]
-
Title: Plan-Grounded Large Language Models for Dual Goal Conversational SettingsSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI)
Abstract: Training Large Language Models (LLMs) to follow user instructions has been shown to supply the LLM with ample capacity to converse fluently while being aligned with humans. Yet, it is not completely clear how an LLM can lead a plan-grounded conversation in mixed-initiative settings where instructions flow in both directions of the conversation, i.e. both the LLM and the user provide instructions to one another. In this paper, we tackle a dual goal mixed-initiative conversational setting where the LLM not only grounds the conversation on an arbitrary plan but also seeks to satisfy both a procedural plan and user instructions. The LLM is then responsible for guiding the user through the plan and, at the same time, adapting to new circumstances, answering questions, and activating safety guardrails when needed. We propose a novel LLM that grounds the dialogue on a procedural plan, can take the dialogue initiative, and enforces guardrails on the system's behavior, while also improving the LLM's responses to unexpected user behavior. Experiments in controlled settings and with real users show that the best-performing model, which we call PlanLLM, achieves a 2.1x improvement over a strong baseline. Moreover, experiments also show good generalization to unseen domains.
- [514] arXiv:2402.01065 (cross-list from cs.CL) [ pdf , ps , html , other ]
-
Title: Evaluation Methodology for Large Language Models for Multilingual Document Question and AnswerSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI)
Abstract: With the widespread adoption of Large Language Models (LLMs), in this paper we investigate the multilingual capability of these models. Our preliminary results show that, translating the native language context, question and answer into a high resource language produced the best results.
- [515] arXiv:2402.01077 (cross-list from cs.LG) [ pdf , ps , html , other ]
-
Title: Recent Advances in Predictive Modeling with Electronic Health RecordsJiaqi Wang , Junyu Luo , Muchao Ye , Xiaochen Wang , Yuan Zhong , Aofei Chang , Guanjie Huang , Ziyi Yin , Cao Xiao , Jimeng Sun , Fenglong MaSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Abstract: The development of electronic health records (EHR) systems has enabled the collection of a vast amount of digitized patient data. However, utilizing EHR data for predictive modeling presents several challenges due to its unique characteristics. With the advancements in machine learning techniques, deep learning has demonstrated its superiority in various applications, including healthcare. This survey systematically reviews recent advances in deep learning-based predictive models using EHR data. Specifically, we begin by introducing the background of EHR data and providing a mathematical definition of the predictive modeling task. We then categorize and summarize predictive deep models from multiple perspectives. Furthermore, we present benchmarks and toolkits relevant to predictive modeling in healthcare. Finally, we conclude this survey by discussing open challenges and suggesting promising directions for future research.
- [516] arXiv:2402.01095 (cross-list from cs.LG) [ pdf , ps , html , other ]
-
Title: How many views does your deep neural network use for prediction?Comments: 20 pagesSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)
Abstract: The generalization ability of Deep Neural Networks (DNNs) is still not fully understood, despite numerous theoretical and empirical analyses. Recently, Allen-Zhu & Li (2023) introduced the concept of multi-views to explain the generalization ability of DNNs, but their main target is ensemble or distilled models, and no method for estimating multi-views used in a prediction of a specific input is discussed. In this paper, we propose Minimal Sufficient Views (MSVs), which is similar to multi-views but can be efficiently computed for real images. MSVs is a set of minimal and distinct features in an input, each of which preserves a model's prediction for the input. We empirically show that there is a clear relationship between the number of MSVs and prediction accuracy across models, including convolutional and transformer models, suggesting that a multi-view like perspective is also important for understanding the generalization ability of (non-ensemble or non-distilled) DNNs.
- [517] arXiv:2402.01096 (cross-list from cs.LG) [ pdf , ps , html , other ]
-
Title: Trustworthy Distributed AI Systems: Robustness, Privacy, and GovernanceComments: Manuscript accepted to ACM Computing SurveysSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Cryptography and Security (cs.CR); Distributed, Parallel, and Cluster Computing (cs.DC)
Abstract: Emerging Distributed AI systems are revolutionizing big data computing and data processing capabilities with growing economic and societal impact. However, recent studies have identified new attack surfaces and risks caused by security, privacy, and fairness issues in AI systems. In this paper, we review representative techniques, algorithms, and theoretical foundations for trustworthy distributed AI through robustness guarantee, privacy protection, and fairness awareness in distributed learning. We first provide a brief overview of alternative architectures for distributed learning, discuss inherent vulnerabilities for security, privacy, and fairness of AI algorithms in distributed learning, and analyze why these problems are present in distributed learning regardless of specific architectures. Then we provide a unique taxonomy of countermeasures for trustworthy distributed AI, covering (1) robustness to evasion attacks and irregular queries at inference, and robustness to poisoning attacks, Byzantine attacks, and irregular data distribution during training; (2) privacy protection during distributed learning and model inference at deployment; and (3) AI fairness and governance with respect to both data and models. We conclude with a discussion on open challenges and future research directions toward trustworthy distributed AI, such as the need for trustworthy AI policy guidelines, the AI responsibility-utility co-design, and incentives and compliance.
- [518] arXiv:2402.01103 (cross-list from cs.LG) [ pdf , ps , other ]
-
Title: Compositional Generative Modeling: A Single Model is Not All You NeedSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO)
Abstract: Large monolithic generative models trained on massive amounts of data have become an increasingly dominant approach in AI research. In this paper, we argue that we should instead construct large generative systems by composing smaller generative models together. We show how such a compositional generative approach enables us to learn distributions in a more data-efficient manner, enabling generalization to parts of the data distribution unseen at training time. We further show how this enables us to program and construct new generative models for tasks completely unseen at training. Finally, we show that in many cases, we can discover separate compositional components from data.
- [519] arXiv:2402.01107 (cross-list from cs.LG) [ pdf , ps , html , other ]
-
Title: Simulation of Graph Algorithms with Looped TransformersComments: 45 pages, 2 figuresSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Data Structures and Algorithms (cs.DS)
Abstract: The execution of graph algorithms using neural networks has recently attracted significant interest due to promising empirical progress. This motivates further understanding of how neural networks can replicate reasoning steps with relational data. In this work, we study the ability of transformer networks to simulate algorithms on graphs from a theoretical perspective. The architecture that we utilize is a looped transformer with extra attention heads that interact with the graph. We prove by construction that this architecture can simulate algorithms such as Dijkstra's shortest path algorithm, Breadth- and Depth-First Search, and Kosaraju's strongly connected components algorithm. The width of the network does not increase with the size of the input graph, which implies that the network can simulate the above algorithms for any graph. Despite this property, we show that there is a limit to simulation in our solution due to finite precision. Finally, we show a Turing Completeness result with constant width when the extra attention heads are utilized.
- [520] arXiv:2402.01111 (cross-list from cs.LG) [ pdf , ps , html , other ]
-
Title: Near-Optimal Reinforcement Learning with Self-Play under Adaptivity ConstraintsSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Multiagent Systems (cs.MA); Machine Learning (stat.ML)
Abstract: We study the problem of multi-agent reinforcement learning (MARL) with adaptivity constraints -- a new problem motivated by real-world applications where deployments of new policies are costly and the number of policy updates must be minimized. For two-player zero-sum Markov Games, we design a (policy) elimination based algorithm that achieves a regret of $\widetilde{O}(\sqrt{H^3 S^2 ABK})$, while the batch complexity is only $O(H+\log\log K)$. In the above, $S$ denotes the number of states, $A,B$ are the number of actions for the two players respectively, $H$ is the horizon and $K$ is the number of episodes. Furthermore, we prove a batch complexity lower bound $\Omega(\frac{H}{\log_{A}K}+\log\log K)$ for all algorithms with $\widetilde{O}(\sqrt{K})$ regret bound, which matches our upper bound up to logarithmic factors. As a byproduct, our techniques naturally extend to learning bandit games and reward-free MARL within near optimal batch complexity. To the best of our knowledge, these are the first line of results towards understanding MARL with low adaptivity.
- [521] arXiv:2402.01114 (cross-list from cs.LG) [ pdf , ps , html , other ]
-
Title: Double-Dip: Thwarting Label-Only Membership Inference Attacks with Transfer Learning and RandomizationArezoo Rajabi , Reeya Pimple , Aiswarya Janardhanan , Surudhi Asokraj , Bhaskar Ramasubramanian , Radha PoovendranSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Cryptography and Security (cs.CR)
Abstract: Transfer learning (TL) has been demonstrated to improve DNN model performance when faced with a scarcity of training samples. However, the suitability of TL as a solution to reduce vulnerability of overfitted DNNs to privacy attacks is unexplored. A class of privacy attacks called membership inference attacks (MIAs) aim to determine whether a given sample belongs to the training dataset (member) or not (nonmember). We introduce Double-Dip, a systematic empirical study investigating the use of TL (Stage-1) combined with randomization (Stage-2) to thwart MIAs on overfitted DNNs without degrading classification accuracy. Our study examines the roles of shared feature space and parameter values between source and target models, number of frozen layers, and complexity of pretrained models. We evaluate Double-Dip on three (Target, Source) dataset paris: (i) (CIFAR-10, ImageNet), (ii) (GTSRB, ImageNet), (iii) (CelebA, VGGFace2). We consider four publicly available pretrained DNNs: (a) VGG-19, (b) ResNet-18, (c) Swin-T, and (d) FaceNet. Our experiments demonstrate that Stage-1 reduces adversary success while also significantly increasing classification accuracy of nonmembers against an adversary with either white-box or black-box DNN model access, attempting to carry out SOTA label-only MIAs. After Stage-2, success of an adversary carrying out a label-only MIA is further reduced to near 50%, bringing it closer to a random guess and showing the effectiveness of Double-Dip. Stage-2 of Double-Dip also achieves lower ASR and higher classification accuracy than regularization and differential privacy-based methods.
- [522] arXiv:2402.01134 (cross-list from cs.CV) [ pdf , ps , html , other ]
-
Title: DeepAAT: Deep Automated Aerial Triangulation for Fast UAV-based MappingSubjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI)
Abstract: Automated Aerial Triangulation (AAT), aiming to restore image pose and reconstruct sparse points simultaneously, plays a pivotal role in earth observation. With its rich research heritage spanning several decades in photogrammetry, AAT has evolved into a fundamental process widely applied in large-scale Unmanned Aerial Vehicle (UAV) based mapping. Despite its advancements, classic AAT methods still face challenges like low efficiency and limited robustness. This paper introduces DeepAAT, a deep learning network designed specifically for AAT of UAV imagery. DeepAAT considers both spatial and spectral characteristics of imagery, enhancing its capability to resolve erroneous matching pairs and accurately predict image poses. DeepAAT marks a significant leap in AAT's efficiency, ensuring thorough scene coverage and precision. Its processing speed outpaces incremental AAT methods by hundreds of times and global AAT methods by tens of times while maintaining a comparable level of reconstruction accuracy. Additionally, DeepAAT's scene clustering and merging strategy facilitate rapid localization and pose determination for large-scale UAV images, even under constrained computing resources. The experimental results demonstrate DeepAAT's substantial improvements over conventional AAT methods, highlighting its potential in the efficiency and accuracy of UAV-based 3D reconstruction tasks. To benefit the photogrammetry society, the code of DeepAAT will be released at: this https URL .
- [523] arXiv:2402.01140 (cross-list from cs.LG) [ pdf , ps , other ]
-
Title: Root Cause Analysis In Microservice Using Neural Granger Causal DiscoveryComments: AAAI 2024 Main TrackSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Distributed, Parallel, and Cluster Computing (cs.DC)
Abstract: In recent years, microservices have gained widespread adoption in IT operations due to their scalability, maintenance, and flexibility. However, it becomes challenging for site reliability engineers (SREs) to pinpoint the root cause due to the complex relationships in microservices when facing system malfunctions. Previous research employed structured learning methods (e.g., PC-algorithm) to establish causal relationships and derive root causes from causal graphs. Nevertheless, they ignored the temporal order of time series data and failed to leverage the rich information inherent in the temporal relationships. For instance, in cases where there is a sudden spike in CPU utilization, it can lead to an increase in latency for other microservices. However, in this scenario, the anomaly in CPU utilization occurs before the latency increase, rather than simultaneously. As a result, the PC-algorithm fails to capture such characteristics. To address these challenges, we propose RUN, a novel approach for root cause analysis using neural Granger causal discovery with contrastive learning. RUN enhances the backbone encoder by integrating contextual information from time series, and leverages a time series forecasting model to conduct neural Granger causal discovery. In addition, RUN incorporates Pagerank with a personalization vector to efficiently recommend the top-k root causes. Extensive experiments conducted on the synthetic and real-world microservice-based datasets demonstrate that RUN noticeably outperforms the state-of-the-art root cause analysis methods. Moreover, we provide an analysis scenario for the sock-shop case to showcase the practicality and efficacy of RUN in microservice-based applications. Our code is publicly available at this https URL .
- [524] arXiv:2402.01143 (cross-list from cs.LG) [ pdf , ps , html , other ]
-
Title: Learning Network Representations with Disentangled Graph Auto-EncoderComments: 61 pages, 13 figuresSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
Abstract: The (variational) graph auto-encoder is extensively employed for learning representations of graph-structured data. However, the formation of real-world graphs is a complex and heterogeneous process influenced by latent factors. Existing encoders are fundamentally holistic, neglecting the entanglement of latent factors. This not only makes graph analysis tasks less effective but also makes it harder to understand and explain the representations. Learning disentangled graph representations with (variational) graph auto-encoder poses significant challenges, and remains largely unexplored in the existing literature. In this article, we introduce the Disentangled Graph Auto-Encoder (DGA) and Disentangled Variational Graph Auto-Encoder (DVGA), approaches that leverage generative models to learn disentangled representations. Specifically, we first design a disentangled graph convolutional network with multi-channel message-passing layers, as the encoder aggregating information related to each disentangled latent factor. Subsequently, a component-wise flow is applied to each channel to enhance the expressive capabilities of disentangled variational graph auto-encoder. Additionally, we design a factor-wise decoder, considering the characteristics of disentangled representations. In order to further enhance the independence among representations, we introduce independence constraints on mapping channels for different latent factors. Empirical experiments on both synthetic and real-world datasets show the superiority of our proposed method compared to several state-of-the-art baselines.
- [525] arXiv:2402.01145 (cross-list from cs.NE) [ pdf , ps , html , other ]
-
Title: ReEvo: Large Language Models as Hyper-Heuristics with Reflective EvolutionSubjects: Neural and Evolutionary Computing (cs.NE) ; Artificial Intelligence (cs.AI)
Abstract: The omnipresence of NP-hard combinatorial optimization problems (COPs) compels domain experts to engage in trial-and-error heuristic design process. The long-standing endeavor of design automation has gained new momentum with the rise of large language models (LLMs). This paper introduces Language Hyper-Heuristics (LHHs), an emerging variant of Hyper-Heuristics that leverages LLMs for heuristic generation, featuring minimal manual intervention and open-ended heuristic spaces. To empower LHHs, we present Reflective Evolution (ReEvo), a generic searching framework that emulates the reflective design approach of human experts while far surpassing human capabilities with its scalable LLM inference, Internet-scale domain knowledge, and powerful evolutionary search. Evaluations across 12 COP settings show that 1) verbal reflections for evolution lead to smoother fitness landscapes, explicit inference of black-box COP settings, and better search results; 2) heuristics generated by ReEvo in minutes can outperform state-of-the-art human designs and neural solvers; 3) LHHs enable efficient algorithm design automation even when challenged with black-box COPs, demonstrating its potential for complex and novel real-world applications. Our code is available: this https URL .
- [526] arXiv:2402.01162 (cross-list from cs.CV) [ pdf , ps , other ]
-
Title: 2AFC Prompting of Large Multimodal Models for Image Quality AssessmentSubjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI)
Abstract: While abundant research has been conducted on improving high-level visual understanding and reasoning capabilities of large multimodal models~(LMMs), their visual quality assessment~(IQA) ability has been relatively under-explored. Here we take initial steps towards this goal by employing the two-alternative forced choice~(2AFC) prompting, as 2AFC is widely regarded as the most reliable way of collecting human opinions of visual quality. Subsequently, the global quality score of each image estimated by a particular LMM can be efficiently aggregated using the maximum a posterior estimation. Meanwhile, we introduce three evaluation criteria: consistency, accuracy, and correlation, to provide comprehensive quantifications and deeper insights into the IQA capability of five LMMs. Extensive experiments show that existing LMMs exhibit remarkable IQA ability on coarse-grained quality comparison, but there is room for improvement on fine-grained quality discrimination. The proposed dataset sheds light on the future development of IQA models based on LMMs. The codes will be made publicly available at this https URL .
- [527] arXiv:2402.01166 (cross-list from cs.CV) [ pdf , ps , html , other ]
-
Title: A Comprehensive Survey on 3D Content GenerationJian Liu , Xiaoshui Huang , Tianyu Huang , Lu Chen , Yuenan Hou , Shixiang Tang , Ziwei Liu , Wanli Ouyang , Wangmeng Zuo , Junjun Jiang , Xianming LiuComments: under reviewSubjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI)
Abstract: Recent years have witnessed remarkable advances in artificial intelligence generated content(AIGC), with diverse input modalities, e.g., text, image, video, audio and 3D. The 3D is the most close visual modality to real-world 3D environment and carries enormous knowledge. The 3D content generation shows both academic and practical values while also presenting formidable technical challenges. This review aims to consolidate developments within the burgeoning domain of 3D content generation. Specifically, a new taxonomy is proposed that categorizes existing approaches into three types: 3D native generative methods, 2D prior-based 3D generative methods, and hybrid 3D generative methods. The survey covers approximately 60 papers spanning the major techniques. Besides, we discuss limitations of current 3D content generation techniques, and point out open challenges as well as promising directions for future work. Accompanied with this survey, we have established a project website where the resources on 3D content generation research are provided. The project page is available at this https URL .
- [528] arXiv:2402.01169 (cross-list from cs.CV) [ pdf , ps , html , other ]
-
Title: Faster Inference of Integer SWIN Transformer by Removing the GELU ActivationComments: 5 pages, 1 figure. Submitted to Edge Intelligence Workshop III, an AAAI 2024 workshopSubjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI)
Abstract: SWIN transformer is a prominent vision transformer model that has state-of-the-art accuracy in image classification tasks. Despite this success, its unique architecture causes slower inference compared with similar deep neural networks. Integer quantization of the model is one of the methods used to improve its inference latency. However, state-of-the-art has not been able to fully quantize the model. In this work, we improve upon the inference latency of the state-of-the-art methods by removing the floating-point operations, which are associated with the GELU activation in Swin Transformer. While previous work proposed to replace the non-integer operations with linear approximation functions, we propose to replace GELU with ReLU activation. The advantage of ReLU over previous methods is its low memory and computation complexity. We use iterative knowledge distillation to compensate for the lost accuracy due to replacing GELU with ReLU. We quantize our GELU-less SWIN transformer and show that on an RTX 4090 NVIDIA GPU we can improve the inference latency of the quantized SWIN transformer by at least $11\%$ while maintaining an accuracy drop of under $0.5\%$ on the ImageNet evaluation dataset.
- [529] arXiv:2402.01195 (cross-list from cs.LG) [ pdf , ps , other ]
-
Title: Conditional Normalizing Flows for Active Learning of Coarse-Grained Molecular RepresentationsSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Chemical Physics (physics.chem-ph)
Abstract: Efficient sampling of the Boltzmann distribution of molecular systems is a long-standing challenge. Recently, instead of generating long molecular dynamics simulations, generative machine learning methods such as normalizing flows have been used to learn the Boltzmann distribution directly, without samples. However, this approach is susceptible to mode collapse and thus often does not explore the full configurational space. In this work, we address this challenge by separating the problem into two levels, the fine-grained and coarse-grained degrees of freedom. A normalizing flow conditioned on the coarse-grained space yields a probabilistic connection between the two levels. To explore the configurational space, we employ coarse-grained simulations with active learning which allows us to update the flow and make all-atom potential energy evaluations only when necessary. Using alanine dipeptide as an example, we show that our methods obtain a speedup to molecular dynamics simulations of approximately 15.9 to 216.2 compared to the speedup of 4.5 of the current state-of-the-art machine learning approach.
- [530] arXiv:2402.01201 (cross-list from cs.LG) [ pdf , ps , other ]
-
Title: Few-Shot Class-Incremental Learning with Prior KnowledgeSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Abstract: To tackle the issues of catastrophic forgetting and overfitting in few-shot class-incremental learning (FSCIL), previous work has primarily concentrated on preserving the memory of old knowledge during the incremental phase. The role of pre-trained model in shaping the effectiveness of incremental learning is frequently underestimated in these studies. Therefore, to enhance the generalization ability of the pre-trained model, we propose Learning with Prior Knowledge (LwPK) by introducing nearly free prior knowledge from a few unlabeled data of subsequent incremental classes. We cluster unlabeled incremental class samples to produce pseudo-labels, then jointly train these with labeled base class samples, effectively allocating embedding space for both old and new class data. Experimental results indicate that LwPK effectively enhances the model resilience against catastrophic forgetting, with theoretical analysis based on empirical risk minimization and class distance measurement corroborating its operational principles. The source code of LwPK is publicly available at: \url{ this https URL }.
- [531] arXiv:2402.01204 (cross-list from cs.LG) [ pdf , ps , other ]
-
Title: A Survey on Self-Supervised Learning for Non-Sequential Tabular DataComments: The paper list can be found at this https URLSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Abstract: Self-supervised learning (SSL) has been incorporated into many state-of-the-art models in various domains, where SSL defines pretext tasks based on unlabeled datasets to learn contextualized and robust representations. Recently, SSL has been a new trend in exploring the representation learning capability in the realm of tabular data, which is more challenging due to not having explicit relations for learning descriptive representations. This survey aims to systematically review and summarize the recent progress and challenges of SSL for non-sequential tabular data (SSL4NS-TD). We first present a formal definition of NS-TD and clarify its correlation to related studies. Then, these approaches are categorized into three groups -- predictive learning, contrastive learning, and hybrid learning, with their motivations and strengths of representative methods within each direction. On top of this, application issues of SSL4NS-TD are presented, including automatic data engineering, cross-table transferability, and domain knowledge integration. In addition, we elaborate on existing benchmarks and datasets for NS-TD applications to discuss the performance of existing tabular models. Finally, we discuss the challenges of SSL4NS-TD and provide potential directions for future research. We expect our work to be useful in terms of encouraging more research on lowering the barrier to entry SSL for the tabular domain and improving the foundations for implicit tabular data.
- [532] arXiv:2402.01207 (cross-list from cs.LG) [ pdf , ps , html , other ]
-
Title: Efficient Causal Graph Discovery Using Large Language ModelsSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Methodology (stat.ME)
Abstract: We propose a novel framework that leverages LLMs for full causal graph discovery. While previous LLM-based methods have used a pairwise query approach, this requires a quadratic number of queries which quickly becomes impractical for larger causal graphs. In contrast, the proposed framework uses a breadth-first search (BFS) approach which allows it to use only a linear number of queries. We also show that the proposed method can easily incorporate observational data when available, to improve performance. In addition to being more time and data-efficient, the proposed framework achieves state-of-the-art results on real-world causal graphs of varying sizes. The results demonstrate the effectiveness and efficiency of the proposed method in discovering causal relationships, showcasing its potential for broad applicability in causal graph discovery tasks across different domains.
- [533] arXiv:2402.01208 (cross-list from cs.LG) [ pdf , ps , html , other ]
-
Title: Location Agnostic Adaptive Rain Precipitation Prediction using Deep LearningMd Shazid Islam , Md Saydur Rahman , Md Saad Ul Haque , Farhana Akter Tumpa , Md Sanzid Bin Hossain , Abul Al ArabiSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Abstract: Rain precipitation prediction is a challenging task as it depends on weather and meteorological features which vary from location to location. As a result, a prediction model that performs well at one location does not perform well at other locations due to the distribution shifts. In addition, due to global warming, the weather patterns are changing very rapidly year by year which creates the possibility of ineffectiveness of those models even at the same location as time passes. In our work, we have proposed an adaptive deep learning-based framework in order to provide a solution to the aforementioned challenges. Our method can generalize the model for the prediction of precipitation for any location where the methods without adaptation fail. Our method has shown 43.51%, 5.09%, and 38.62% improvement after adaptation using a deep neural network for predicting the precipitation of Paris, Los Angeles, and Tokyo, respectively.
- [534] arXiv:2402.01219 (cross-list from cs.CR) [ pdf , ps , other ]
-
Title: AI Code Generators for Security: Friend or Foe?Comments: Dataset available at: this https URLJournal-ref: IEEE Security & Privacy, Early Access, February 2024Subjects: Cryptography and Security (cs.CR) ; Artificial Intelligence (cs.AI); Software Engineering (cs.SE)
Abstract: Recent advances of artificial intelligence (AI) code generators are opening new opportunities in software security research, including misuse by malicious actors. We review use cases for AI code generators for security and introduce an evaluation benchmark.
- [535] arXiv:2402.01227 (cross-list from cs.SD) [ pdf , ps , other ]
-
Title: STAA-Net: A Sparse and Transferable Adversarial Attack for Speech Emotion RecognitionYi Chang , Zhao Ren , Zixing Zhang , Xin Jing , Kun Qian , Xi Shao , Bin Hu , Tanja Schultz , Björn W. SchullerSubjects: Sound (cs.SD) ; Artificial Intelligence (cs.AI); Human-Computer Interaction (cs.HC); Audio and Speech Processing (eess.AS)
Abstract: Speech contains rich information on the emotions of humans, and Speech Emotion Recognition (SER) has been an important topic in the area of human-computer interaction. The robustness of SER models is crucial, particularly in privacy-sensitive and reliability-demanding domains like private healthcare. Recently, the vulnerability of deep neural networks in the audio domain to adversarial attacks has become a popular area of research. However, prior works on adversarial attacks in the audio domain primarily rely on iterative gradient-based techniques, which are time-consuming and prone to overfitting the specific threat model. Furthermore, the exploration of sparse perturbations, which have the potential for better stealthiness, remains limited in the audio domain. To address these challenges, we propose a generator-based attack method to generate sparse and transferable adversarial examples to deceive SER models in an end-to-end and efficient manner. We evaluate our method on two widely-used SER datasets, Database of Elicited Mood in Speech (DEMoS) and Interactive Emotional dyadic MOtion CAPture (IEMOCAP), and demonstrate its ability to generate successful sparse adversarial examples in an efficient manner. Moreover, our generated adversarial examples exhibit model-agnostic transferability, enabling effective adversarial attacks on advanced victim models.
- [536] arXiv:2402.01238 (cross-list from cs.LG) [ pdf , ps , other ]
-
Title: Flexible Variational Information Bottleneck: Achieving Diverse Compression with a Single TrainingSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Information Theory (cs.IT)
Abstract: Information Bottleneck (IB) is a widely used framework that enables the extraction of information related to a target random variable from a source random variable. In the objective function, IB controls the trade-off between data compression and predictiveness through the Lagrange multiplier $\beta$. Traditionally, to find the trade-off to be learned, IB requires a search for $\beta$ through multiple training cycles, which is computationally expensive. In this study, we introduce Flexible Variational Information Bottleneck (FVIB), an innovative framework for classification task that can obtain optimal models for all values of $\beta$ with single, computationally efficient training. We theoretically demonstrate that across all values of reasonable $\beta$, FVIB can simultaneously maximize an approximation of the objective function for Variational Information Bottleneck (VIB), the conventional IB method. Then we empirically show that FVIB can learn the VIB objective as effectively as VIB. Furthermore, in terms of calibration performance, FVIB outperforms other IB and calibration methods by enabling continuous optimization of $\beta$. Our codes are available at this https URL .
- [537] arXiv:2402.01239 (cross-list from cs.CV) [ pdf , ps , other ]
-
Title: PRIME: Protect Your Videos From Malicious EditingSubjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI)
Abstract: With the development of generative models, the quality of generated content keeps increasing. Recently, open-source models have made it surprisingly easy to manipulate and edit photos and videos, with just a few simple prompts. While these cutting-edge technologies have gained popularity, they have also given rise to concerns regarding the privacy and portrait rights of individuals. Malicious users can exploit these tools for deceptive or illegal purposes. Although some previous works focus on protecting photos against generative models, we find there are still gaps between protecting videos and images in the aspects of efficiency and effectiveness. Therefore, we introduce our protection method, PRIME, to significantly reduce the time cost and improve the protection performance. Moreover, to evaluate our proposed protection method, we consider both objective metrics and human subjective metrics. Our evaluation results indicate that PRIME only costs 8.3% GPU hours of the cost of the previous state-of-the-art method and achieves better protection results on both human evaluation and objective metrics. Code can be found in this https URL .
- [538] arXiv:2402.01240 (cross-list from cs.CR) [ pdf , ps , html , other ]
-
Title: Beyond the Request: Harnessing HTTP Response Headers for Cross-Browser Web Tracker Classification in an Imbalanced SettingSubjects: Cryptography and Security (cs.CR) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: The World Wide Web's connectivity is greatly attributed to the HTTP protocol, with HTTP messages offering informative header fields that appeal to disciplines like web security and privacy, especially concerning web tracking. Despite existing research employing HTTP/S request messages to identify web trackers, HTTP/S response headers are often overlooked. This study endeavors to design effective machine learning classifiers for web tracker detection using HTTP/S response headers. Data from the Chrome, Firefox, and Brave browsers, obtained through the traffic monitoring browser extension T.EX, serves as our data set. Eleven supervised models were trained on Chrome data and tested across all browsers. The results demonstrated high accuracy, F1-score, precision, recall, and minimal log-loss error for Chrome and Firefox, but subpar performance on Brave, potentially due to its distinct data distribution and feature set. The research suggests that these classifiers are viable for detecting web trackers in Chrome and Firefox. However, real-world application testing remains pending, and the distinction between tracker types and broader label sources could be explored in future studies.
- [539] arXiv:2402.01241 (cross-list from cs.CV) [ pdf , ps , other ]
-
Title: Can Shape-Infused Joint Embeddings Improve Image-Conditioned 3D Diffusion?Comments: 10 pages, 6 figuresSubjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI)
Abstract: Recent advancements in deep generative models, particularly with the application of CLIP (Contrastive Language Image Pretraining) to Denoising Diffusion Probabilistic Models (DDPMs), have demonstrated remarkable effectiveness in text to image generation. The well structured embedding space of CLIP has also been extended to image to shape generation with DDPMs, yielding notable results. Despite these successes, some fundamental questions arise: Does CLIP ensure the best results in shape generation from images? Can we leverage conditioning to bring explicit 3D knowledge into the generative process and obtain better quality? This study introduces CISP (Contrastive Image Shape Pre training), designed to enhance 3D shape synthesis guided by 2D images. CISP aims to enrich the CLIP framework by aligning 2D images with 3D shapes in a shared embedding space, specifically capturing 3D characteristics potentially overlooked by CLIP's text image focus. Our comprehensive analysis assesses CISP's guidance performance against CLIP guided models, focusing on generation quality, diversity, and coherence of the produced shapes with the conditioning image. We find that, while matching CLIP in generation quality and diversity, CISP substantially improves coherence with input images, underscoring the value of incorporating 3D knowledge into generative models. These findings suggest a promising direction for advancing the synthesis of 3D visual content by integrating multimodal systems with 3D representations.
- [540] arXiv:2402.01259 (cross-list from cs.NI) [ pdf , ps , html , other ]
-
Title: Position Aware 60 GHz mmWave Beamforming for V2V Communications Utilizing Deep LearningComments: 2024 IEEE International Conference on Communications (ICC), Denver, CO, USASubjects: Networking and Internet Architecture (cs.NI) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Social and Information Networks (cs.SI); Systems and Control (eess.SY)
Abstract: Beamforming techniques are considered as essential parts to compensate the severe path loss in millimeter-wave (mmWave) communications by adopting large antenna arrays and formulating narrow beams to obtain satisfactory received powers. However, performing accurate beam alignment over such narrow beams for efficient link configuration by traditional beam selection approaches, mainly relied on channel state information, typically impose significant latency and computing overheads, which is often infeasible in vehicle-to-vehicle (V2V) communications like highly dynamic scenarios. In contrast, utilizing out-of-band contextual information, such as vehicular position information, is a potential alternative to reduce such overheads. In this context, this paper presents a deep learning-based solution on utilizing the vehicular position information for predicting the optimal beams having sufficient mmWave received powers so that the best V2V line-of-sight links can be ensured proactively. After experimental evaluation of the proposed solution on real-world measured mmWave sensing and communications datasets, the results show that the solution can achieve up to 84.58% of received power of link status on average, which confirm a promising solution for beamforming in mmWave at 60 GHz enabled V2V communications.
- [541] arXiv:2402.01261 (cross-list from cs.LG) [ pdf , ps , other ]
-
Title: TEDDY: Trimming Edges with Degree-based Discrimination strategYSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Abstract: Since the pioneering work on the lottery ticket hypothesis for graph neural networks (GNNs) was proposed in Chen et al. (2021), the study on finding graph lottery tickets (GLT) has become one of the pivotal focus in the GNN community, inspiring researchers to discover sparser GLT while achieving comparable performance to original dense networks. In parallel, the graph structure has gained substantial attention as a crucial factor in GNN training dynamics, also elucidated by several recent studies. Despite this, contemporary studies on GLT, in general, have not fully exploited inherent pathways in the graph structure and identified tickets in an iterative manner, which is time-consuming and inefficient. To address these limitations, we introduce TEDDY, a one-shot edge sparsification framework that leverages structural information by incorporating edge-degree information. Following edge sparsification, we encourage the parameter sparsity during training via simple projected gradient descent on the $\ell_0$ ball. Given the target sparsity levels for both the graph structure and the model parameters, our TEDDY facilitates efficient and rapid realization of GLT within a single training. Remarkably, our experimental results demonstrate that TEDDY significantly surpasses conventional iterative approaches in generalization, even when conducting one-shot sparsification that solely utilizes graph structures, without taking feature information into account.
- [542] arXiv:2402.01267 (cross-list from cs.CL) [ pdf , ps , html , other ]
-
Title: The Human and the Mechanical: logos, truthfulness, and ChatGPTComments: Under submissionSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI)
Abstract: The paper addresses the question of whether it is appropriate to talk about `mechanical minds' at all, and whether ChatGPT models can indeed be thought of as realizations of that. Our paper adds a semantic argument to the current debate. The act of human assertion requires the formation of a veridicality judgment. Modification of assertions with modals (John must be at home) and the use of subjective elements (John is obviously at home) indicate that the speaker is manipulating her judgments and, in a cooperative context, intends her epistemic state to be transparent to the addressee. Veridicality judgments are formed on the basis of two components: (i) evidence that relates to reality (exogenous evidence) and (ii) endogenous evidence, such as preferences and private beliefs. `Mechanical minds' lack these two components: (i) they do not relate to reality and (ii) do not have endogenous evidence. Therefore they lack the ability to form a belief about the world and a veridicality judgments altogether. They can only mimic that judgment, but the output is not ground in the very foundations for it.
- [543] arXiv:2402.01295 (cross-list from cs.LG) [ pdf , ps , other ]
-
Title: ExtremeCast: Boosting Extreme Value Prediction for Global Weather ForecastSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Abstract: Data-driven weather forecast based on machine learning (ML) has experienced rapid development and demonstrated superior performance in the global medium-range forecast compared to traditional physics-based dynamical models. However, most of these ML models struggle with accurately predicting extreme weather, which is closely related to the extreme value prediction. Through mathematical analysis, we prove that the use of symmetric losses, such as the Mean Squared Error (MSE), leads to biased predictions and underestimation of extreme values. To address this issue, we introduce Exloss, a novel loss function that performs asymmetric optimization and highlights extreme values to obtain accurate extreme weather forecast. Furthermore, we introduce a training-free extreme value enhancement strategy named ExEnsemble, which increases the variance of pixel values and improves the forecast robustness. Combined with an advanced global weather forecast model, extensive experiments show that our solution can achieve state-of-the-art performance in extreme weather prediction, while maintaining the overall forecast accuracy comparable to the top medium-range forecast models.
- [544] arXiv:2402.01298 (cross-list from eess.AS) [ pdf , ps , html , other ]
-
Title: Learning Semantic Information from Raw Audio Signal Using Both Contextual and Phonetic RepresentationsComments: Accepted to ICASSP 2024Subjects: Audio and Speech Processing (eess.AS) ; Artificial Intelligence (cs.AI); Sound (cs.SD)
Abstract: We propose a framework to learn semantics from raw audio signals using two types of representations, encoding contextual and phonetic information respectively. Specifically, we introduce a speech-to-unit processing pipeline that captures two types of representations with different time resolutions. For the language model, we adopt a dual-channel architecture to incorporate both types of representation. We also present new training objectives, masked context reconstruction and masked context prediction, that push models to learn semantics effectively. Experiments on the sSIMI metric of Zero Resource Speech Benchmark 2021 and Fluent Speech Command dataset show our framework learns semantics better than models trained with only one type of representation.
- [545] arXiv:2402.01306 (cross-list from cs.LG) [ pdf , ps , html , other ]
-
Title: KTO: Model Alignment as Prospect Theoretic OptimizationComments: preprintSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Abstract: Kahneman & Tversky's $\textit{prospect theory}$ tells us that humans perceive random variables in a biased but well-defined manner; for example, humans are famously loss-averse. We show that objectives for aligning LLMs with human feedback implicitly incorporate many of these biases -- the success of these objectives (e.g., DPO) over cross-entropy minimization can partly be ascribed to them being $\textit{human-aware loss functions}$ (HALOs). However, the utility functions these methods attribute to humans still differ from those in the prospect theory literature. Using a Kahneman-Tversky model of human utility, we propose a HALO that directly maximizes the utility of generations instead of maximizing the log-likelihood of preferences, as current methods do. We call this approach Kahneman-Tversky Optimization (KTO), and it matches or exceeds the performance of preference-based methods at scales from 1B to 30B. Crucially, KTO does not need preferences -- only a binary signal of whether an output is desirable or undesirable for a given input. This makes it far easier to use in the real world, where preference data is scarce and expensive.
- [546] arXiv:2402.01327 (cross-list from cs.LG) [ pdf , ps , html , other ]
-
Title: Supervised Algorithmic Fairness in Distribution Shifts: A SurveyComments: IJCAI 2024Subjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Computers and Society (cs.CY)
Abstract: Supervised fairness-aware machine learning under distribution shifts is an emerging field that addresses the challenge of maintaining equitable and unbiased predictions when faced with changes in data distributions from source to target domains. In real-world applications, machine learning models are often trained on a specific dataset but deployed in environments where the data distribution may shift over time due to various factors. This shift can lead to unfair predictions, disproportionately affecting certain groups characterized by sensitive attributes, such as race and gender. In this survey, we provide a summary of various types of distribution shifts and comprehensively investigate existing methods based on these shifts, highlighting six commonly used approaches in the literature. Additionally, this survey lists publicly available datasets and evaluation metrics for empirical studies. We further explore the interconnection with related research fields, discuss the significant challenges, and identify potential directions for future studies.
- [547] arXiv:2402.01335 (cross-list from cs.CV) [ pdf , ps , html , other ]
-
Title: Simulator-Free Visual Domain Randomization via Video GamesChintan Trivedi , Nemanja Rašajski , Konstantinos Makantasis , Antonios Liapis , Georgios N. YannakakisSubjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI)
Abstract: Domain randomization is an effective computer vision technique for improving transferability of vision models across visually distinct domains exhibiting similar content. Existing approaches, however, rely extensively on tweaking complex and specialized simulation engines that are difficult to construct, subsequently affecting their feasibility and scalability. This paper introduces BehAVE, a video understanding framework that uniquely leverages the plethora of existing commercial video games for domain randomization, without requiring access to their simulation engines. Under BehAVE (1) the inherent rich visual diversity of video games acts as the source of randomization and (2) player behavior -- represented semantically via textual descriptions of actions -- guides the *alignment* of videos with similar content. We test BehAVE on 25 games of the first-person shooter (FPS) genre across various video and text foundation models and we report its robustness for domain randomization. BehAVE successfully aligns player behavioral patterns and is able to zero-shot transfer them to multiple unseen FPS games when trained on just one FPS game. In a more challenging setting, BehAVE manages to improve the zero-shot transferability of foundation models to unseen FPS games (up to 22%) even when trained on a game of a different genre (Minecraft). Code and dataset can be found at this https URL .
- [548] arXiv:2402.01345 (cross-list from cs.CV) [ pdf , ps , other ]
-
Title: Skip \n: A Simple Method to Reduce Hallucination in Large Vision-Language ModelsSubjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG)
Abstract: Recent advancements in large vision-language models (LVLMs) have demonstrated impressive capability in visual information understanding with human language. Despite these advances, LVLMs still face challenges with multimodal hallucination, such as generating text descriptions of objects that are not present in the visual information. However, the underlying fundamental reasons of multimodal hallucinations remain poorly explored. In this paper, we propose a new perspective, suggesting that the inherent biases in LVLMs might be a key factor in hallucinations. Specifically, we systematically identify a semantic shift bias related to paragraph breaks (\n\n), where the content before and after '\n\n' in the training data frequently exhibit significant semantic changes. This pattern leads the model to infer that the contents following '\n\n' should be obviously different from the preceding contents with less hallucinatory descriptions, thereby increasing the probability of hallucinatory descriptions subsequent to the '\n\n'. We have validated this hypothesis on multiple publicly available LVLMs. Besides, we find that deliberately inserting '\n\n' at the generated description can induce more hallucinations. A simple method is proposed to effectively mitigate the hallucination of LVLMs by skipping the output of '\n'.
- [549] arXiv:2402.01348 (cross-list from cs.LG) [ pdf , ps , html , other ]
-
Title: CORE: Mitigating Catastrophic Forgetting in Continual Learning through Cognitive ReplayComments: Accepted by CogSci24 as oral presentationSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Abstract: This paper introduces a novel perspective to significantly mitigate catastrophic forgetting in continuous learning (CL), which emphasizes models' capacity to preserve existing knowledge and assimilate new information. Current replay-based methods treat every task and data sample equally and thus can not fully exploit the potential of the replay buffer. In response, we propose COgnitive REplay (CORE), which draws inspiration from human cognitive review processes. CORE includes two key strategies: Adaptive Quantity Allocation and Quality-Focused Data Selection. The former adaptively modulates the replay buffer allocation for each task based on its forgetting rate, while the latter guarantees the inclusion of representative data that best encapsulates the characteristics of each task within the buffer. Our approach achieves an average accuracy of 37.95% on split-CIFAR10, surpassing the best baseline method by 6.52%. Additionally, it significantly enhances the accuracy of the poorest-performing task by 6.30% compared to the top baseline. Code is available at this https URL .
- [550] arXiv:2402.01349 (cross-list from cs.CL) [ pdf , ps , html , other ]
-
Title: Beyond the Answers: Reviewing the Rationality of Multiple Choice Question Answering for the Evaluation of Large Language ModelsComments: 13 pages, 4 figuresSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI)
Abstract: In the field of natural language processing (NLP), Large Language Models (LLMs) have precipitated a paradigm shift, markedly enhancing performance in natural language generation tasks. Despite these advancements, the comprehensive evaluation of LLMs remains an inevitable challenge for the community. Recently, the utilization of Multiple Choice Question Answering (MCQA) as a benchmark for LLMs has gained considerable traction. This study investigates the rationality of MCQA as an evaluation method for LLMs. If LLMs genuinely understand the semantics of questions, their performance should exhibit consistency across the varied configurations derived from the same questions. Contrary to this expectation, our empirical findings suggest a notable disparity in the consistency of LLM responses, which we define as REsponse VAriability Syndrome (REVAS) of the LLMs, indicating that current MCQA-based benchmarks may not adequately capture the true capabilities of LLMs, which underscores the need for more robust evaluation mechanisms in assessing the performance of LLMs.
- [551] arXiv:2402.01352 (cross-list from cs.CL) [ pdf , ps , html , other ]
-
Title: Describing Images $\textit{Fast and Slow}$: Quantifying and Predicting the Variation in Human Signals during Visuo-Linguistic ProcessesComments: To appear in EACL 2024Subjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
Abstract: There is an intricate relation between the properties of an image and how humans behave while describing the image. This behavior shows ample variation, as manifested in human signals such as eye movements and when humans start to describe the image. Despite the value of such signals of visuo-linguistic variation, they are virtually disregarded in the training of current pretrained models, which motivates further investigation. Using a corpus of Dutch image descriptions with concurrently collected eye-tracking data, we explore the nature of the variation in visuo-linguistic signals, and find that they correlate with each other. Given this result, we hypothesize that variation stems partly from the properties of the images, and explore whether image representations encoded by pretrained vision encoders can capture such variation. Our results indicate that pretrained models do so to a weak-to-moderate degree, suggesting that the models lack biases about what makes a stimulus complex for humans and what leads to variations in human outputs.
- [552] arXiv:2402.01353 (cross-list from cs.LO) [ pdf , ps , other ]
-
Title: Efficient compilation of expressive problem space specifications to neural network solversSubjects: Logic in Computer Science (cs.LO) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: Recent work has described the presence of the embedding gap in neural network verification. On one side of the gap is a high-level specification about the network's behaviour, written by a domain expert in terms of the interpretable problem space. On the other side are a logically-equivalent set of satisfiability queries, expressed in the uninterpretable embedding space in a form suitable for neural network solvers. In this paper we describe an algorithm for compiling the former to the latter. We explore and overcome complications that arise from targeting neural network solvers as opposed to standard SMT solvers.
- [553] arXiv:2402.01355 (cross-list from cs.CV) [ pdf , ps , other ]
-
Title: FindingEmo: An Image Dataset for Emotion Recognition in the WildComments: 30 pages, 21 figures, 12 tablesSubjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI)
Abstract: We introduce FindingEmo, a new image dataset containing annotations for 25k images, specifically tailored to Emotion Recognition. Contrary to existing datasets, it focuses on complex scenes depicting multiple people in various naturalistic, social settings, with images being annotated as a whole, thereby going beyond the traditional focus on faces or single individuals. Annotated dimensions include Valence, Arousal and Emotion label, with annotations gathered using Prolific. Together with the annotations, we release the list of URLs pointing to the original images, as well as all associated source code.
- [554] arXiv:2402.01376 (cross-list from cs.CL) [ pdf , ps , other ]
-
Title: LoTR: Low Tensor Rank Weight AdaptationComments: Submitted; missing author and sections were added;Subjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: In this paper we generalize and extend an idea of low-rank adaptation (LoRA) of large language models (LLMs) based on Transformer architecture. Widely used LoRA-like methods of fine-tuning LLMs are based on matrix factorization of gradient update. We introduce LoTR, a novel approach for parameter-efficient fine-tuning of LLMs which represents a gradient update to parameters in a form of tensor decomposition. Low-rank adapter for each layer is constructed as a product of three matrices, and tensor structure arises from sharing left and right multipliers of this product among layers. Simultaneous compression of a sequence of layers with low-rank tensor representation allows LoTR to archive even better parameter efficiency then LoRA especially for deep models. Moreover, the core tensor does not depend on original weight dimension and can be made arbitrary small, which allows for extremely cheap and fast downstream fine-tuning.
- [555] arXiv:2402.01399 (cross-list from cs.LG) [ pdf , ps , html , other ]
-
Title: A Probabilistic Model to explain Self-Supervised Representation LearningSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
Abstract: Self-supervised learning (SSL) learns representations by leveraging an auxiliary unsupervised task, such as classifying semantically related samples, e.g. different data augmentations or modalities. Of the many approaches to SSL, contrastive methods, e.g. SimCLR, CLIP and VicREG, have gained attention for learning representations that achieve downstream performance close to that of supervised learning. However, a theoretical understanding of the mechanism behind these methods eludes. We propose a generative latent variable model for the data and show that several families of discriminative self-supervised algorithms, including contrastive methods, approximately induce its latent structure over representations, providing a unifying theoretical framework. We also justify links to mutual information and the use of a projection head. Fitting our model generatively, as SimVE, improves performance over previous VAE methods on common benchmarks (e.g. FashionMNIST, CIFAR10, CelebA), narrows the gap to discriminative methods on _content_ classification and, as our analysis predicts, outperforms them where _style_ information is required, taking a step toward task-agnostic representations.
- [556] arXiv:2402.01401 (cross-list from cs.LG) [ pdf , ps , other ]
-
Title: Zero-Shot Machine Unlearning at Scale via Lipschitz RegularizationSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
Abstract: To comply with AI and data regulations, the need to forget private or copyrighted information from trained machine learning models is increasingly important. The key challenge in unlearning is forgetting the necessary data in a timely manner, while preserving model performance. In this work, we address the zero-shot unlearning scenario, whereby an unlearning algorithm must be able to remove data given only a trained model and the data to be forgotten. Under such a definition, existing state-of-the-art methods are insufficient. Building on the concepts of Lipschitz continuity, we present a method that induces smoothing of the forget sample's output, with respect to perturbations of that sample. We show this smoothing successfully results in forgetting while preserving general model performance. We perform extensive empirical evaluation of our method over a range of contemporary benchmarks, verifying that our method achieves state-of-the-art performance under the strict constraints of zero-shot unlearning.
- [557] arXiv:2402.01408 (cross-list from cs.LG) [ pdf , ps , html , other ]
-
Title: Climbing the Ladder of Interpretability with Counterfactual Concept Bottleneck ModelsGabriele Dominici , Pietro Barbiero , Francesco Giannini , Martin Gjoreski , Giuseppe Marra , Marc LangheinrichSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Abstract: Current deep learning models are not designed to simultaneously address three fundamental questions: predict class labels to solve a given classification task (the "What?"), explain task predictions (the "Why?"), and imagine alternative scenarios that could result in different predictions (the "What if?"). The inability to answer these questions represents a crucial gap in deploying reliable AI agents, calibrating human trust, and deepening human-machine interaction. To bridge this gap, we introduce CounterFactual Concept Bottleneck Models (CF-CBMs), a class of models designed to efficiently address the above queries all at once without the need to run post-hoc searches. Our results show that CF-CBMs produce: accurate predictions (the "What?"), simple explanations for task predictions (the "Why?"), and interpretable counterfactuals (the "What if?"). CF-CBMs can also sample or estimate the most probable counterfactual to: (i) explain the effect of concept interventions on tasks, (ii) show users how to get a desired class label, and (iii) propose concept interventions via "task-driven" interventions.
- [558] arXiv:2402.01410 (cross-list from cs.CV) [ pdf , ps , html , other ]
-
Title: XAI for Skin Cancer Detection with Prototypes and Non-Expert SupervisionComments: Accepted in the iMIMIC Workshop @ MICCAI 2023Subjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: Skin cancer detection through dermoscopy image analysis is a critical task. However, existing models used for this purpose often lack interpretability and reliability, raising the concern of physicians due to their black-box nature. In this paper, we propose a novel approach for the diagnosis of melanoma using an interpretable prototypical-part model. We introduce a guided supervision based on non-expert feedback through the incorporation of: 1) binary masks, obtained automatically using a segmentation network; and 2) user-refined prototypes. These two distinct information pathways aim to ensure that the learned prototypes correspond to relevant areas within the skin lesion, excluding confounding factors beyond its boundaries. Experimental results demonstrate that, even without expert supervision, our approach achieves superior performance and generalization compared to non-interpretable models.
- [559] arXiv:2402.01415 (cross-list from cs.LG) [ pdf , ps , other ]
-
Title: SMLP: Symbolic Machine Learning ProverComments: 12 pages, 4 figures. (submitted)Subjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Logic in Computer Science (cs.LO); Symbolic Computation (cs.SC); Optimization and Control (math.OC)
Abstract: Symbolic Machine Learning Prover (SMLP) is a tool and a library for system exploration based on data samples obtained by simulating or executing the system on a number of input vectors. SMLP aims at exploring the system based on this data by taking a grey-box approach: SMLP combines statistical methods of data exploration with building and exploring machine learning models in close feedback loop with the system's response, and exploring these models by combining probabilistic and formal methods. SMLP has been applied in industrial setting at Intel for analyzing and optimizing hardware designs at the analog level. SMLP is a general purpose tool and can be applied to systems that can be sampled and modeled by machine learning models.
- [560] arXiv:2402.01416 (cross-list from cs.CL) [ pdf , ps , html , other ]
-
Title: Sequence Shortening for Context-Aware Machine TranslationComments: Findings of the ACL: EACL 2024Subjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: Context-aware Machine Translation aims to improve translations of sentences by incorporating surrounding sentences as context. Towards this task, two main architectures have been applied, namely single-encoder (based on concatenation) and multi-encoder models. In this study, we show that a special case of multi-encoder architecture, where the latent representation of the source sentence is cached and reused as the context in the next step, achieves higher accuracy on the contrastive datasets (where the models have to rank the correct translation among the provided sentences) and comparable BLEU and COMET scores as the single- and multi-encoder approaches. Furthermore, we investigate the application of Sequence Shortening to the cached representations. We test three pooling-based shortening techniques and introduce two novel methods - Latent Grouping and Latent Selecting, where the network learns to group tokens or selects the tokens to be cached as context. Our experiments show that the two methods achieve competitive BLEU and COMET scores and accuracies on the contrastive datasets to the other tested methods while potentially allowing for higher interpretability and reducing the growth of memory requirements with increased context size.
- [561] arXiv:2402.01439 (cross-list from cs.LG) [ pdf , ps , other ]
-
Title: From Words to Molecules: A Survey of Large Language Models in ChemistryComments: Submitted to IJCAI 2024 survey trackSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Biomolecules (q-bio.BM); Quantitative Methods (q-bio.QM)
Abstract: In recent years, Large Language Models (LLMs) have achieved significant success in natural language processing (NLP) and various interdisciplinary areas. However, applying LLMs to chemistry is a complex task that requires specialized domain knowledge. This paper provides a thorough exploration of the nuanced methodologies employed in integrating LLMs into the field of chemistry, delving into the complexities and innovations at this interdisciplinary juncture. Specifically, our analysis begins with examining how molecular information is fed into LLMs through various representation and tokenization methods. We then categorize chemical LLMs into three distinct groups based on the domain and modality of their input data, and discuss approaches for integrating these inputs for LLMs. Furthermore, this paper delves into the pretraining objectives with adaptations to chemical LLMs. After that, we explore the diverse applications of LLMs in chemistry, including novel paradigms for their application in chemistry tasks. Finally, we identify promising research directions, including further integration with chemical knowledge, advancements in continual learning, and improvements in model interpretability, paving the way for groundbreaking developments in the field.
- [562] arXiv:2402.01440 (cross-list from cs.LG) [ pdf , ps , html , other ]
-
Title: Few-Shot Learning on Graphs: from Meta-learning to Pre-training and PromptingXingtong Yu , Yuan Fang , Zemin Liu , Yuxia Wu , Zhihao Wen , Jianyuan Bo , Xinming Zhang , Steven C.H. HoiSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Social and Information Networks (cs.SI)
Abstract: Graph representation learning, a critical step in graph-centric tasks, has seen significant advancements. Earlier techniques often operate in an end-to-end setting, where performance heavily relies on the availability of ample labeled data. This constraint has spurred the emergence of few-shot learning on graphs, where only a few task-specific labels are available for each task. Given the extensive literature in this field, this survey endeavors to synthesize recent developments, provide comparative insights, and identify future directions. We systematically categorize existing studies into three major families: meta-learning approaches, pre-training approaches, and hybrid approaches, with a finer-grained classification in each family to aid readers in their method selection process. Within each category, we analyze the relationships among these methods and compare their strengths and limitations. Finally, we outline prospective future directions for few-shot learning on graphs to catalyze continued innovation in this field.
- [563] arXiv:2402.01444 (cross-list from cs.LG) [ pdf , ps , other ]
-
Title: Mission Critical -- Satellite Data is a Distinct Modality in Machine LearningComments: 15 pages, 5 figuresSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
Abstract: Satellite data has the potential to inspire a seismic shift for machine learning -- one in which we rethink existing practices designed for traditional data modalities. As machine learning for satellite data (SatML) gains traction for its real-world impact, our field is at a crossroads. We can either continue applying ill-suited approaches, or we can initiate a new research agenda that centers around the unique characteristics and challenges of satellite data. This position paper argues that satellite data constitutes a distinct modality for machine learning research and that we must recognize it as such to advance the quality and impact of SatML research across theory, methods, and deployment. We outline critical discussion questions and actionable suggestions to transform SatML from merely an intriguing application area to a dedicated research discipline that helps move the needle on big challenges for machine learning and society.
- [564] arXiv:2402.01446 (cross-list from cs.MA) [ pdf , ps , other ]
-
Title: Guidance Graph Optimization for Lifelong Multi-Agent Path FindingComments: Accepted to International Joint Conference on Artificial Intelligence (IJCAI), 2024Subjects: Multiagent Systems (cs.MA) ; Artificial Intelligence (cs.AI); Robotics (cs.RO)
Abstract: We study how to use guidance to improve the throughput of lifelong Multi-Agent Path Finding (MAPF). Previous studies have demonstrated that, while incorporating guidance, such as highways, can accelerate MAPF algorithms, this often results in a trade-off with solution quality. In addition, how to generate good guidance automatically remains largely unexplored, with current methods falling short of surpassing manually designed ones. In this work, we introduce the guidance graph as a versatile representation of guidance for lifelong MAPF, framing Guidance Graph Optimization as the task of optimizing its edge weights. We present two GGO algorithms to automatically generate guidance for arbitrary lifelong MAPF algorithms and maps. The first method directly optimizes edge weights, while the second method optimizes an update model capable of generating edge weights. Empirically, we show that (1) our guidance graphs improve the throughput of three representative lifelong MAPF algorithms in eight benchmark maps, and (2) our update model can generate guidance graphs for as large as $93 \times 91$ maps and as many as 3,000 agents. We include the source code at: \url{ this https URL }. All optimized guidance graphs are available online at: \url{ this https URL }.
- [565] arXiv:2402.01454 (cross-list from cs.LG) [ pdf , ps , other ]
-
Title: Integrating Large Language Models in Causal Discovery: A Statistical Causal ApproachMasayuki Takayama , Tadahisa Okuda , Thong Pham , Tatsuyoshi Ikenoue , Shingo Fukuma , Shohei Shimizu , Akiyoshi SannaiSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Methodology (stat.ME); Machine Learning (stat.ML)
Abstract: In practical statistical causal discovery (SCD), embedding domain expert knowledge as constraints into the algorithm is widely accepted as significant for creating consistent meaningful causal models, despite the recognized challenges in systematic acquisition of the background knowledge. To overcome these challenges, this paper proposes a novel methodology for causal inference, in which SCD methods and knowledge based causal inference (KBCI) with a large language model (LLM) are synthesized through "statistical causal prompting (SCP)" for LLMs and prior knowledge augmentation for SCD. Experiments have revealed that GPT-4 can cause the output of the LLM-KBCI and the SCD result with prior knowledge from LLM-KBCI to approach the ground truth, and that the SCD result can be further improved, if GPT-4 undergoes SCP. Furthermore, it has been clarified that an LLM can improve SCD with its background knowledge, even if the LLM does not contain information on the dataset. The proposed approach can thus address challenges such as dataset biases and limitations, illustrating the potential of LLMs to improve data-driven causal inference across diverse scientific domains.
- [566] arXiv:2402.01467 (cross-list from eess.SY) [ pdf , ps , other ]
-
Title: Brain-Like Replay Naturally Emerges in Reinforcement Learning AgentsSubjects: Systems and Control (eess.SY) ; Artificial Intelligence (cs.AI); Computational Engineering, Finance, and Science (cs.CE); Neural and Evolutionary Computing (cs.NE); Neurons and Cognition (q-bio.NC)
Abstract: Can replay, as a widely observed neural activity pattern in brain regions, particularly in the hippocampus and neocortex, emerge in an artificial agent? If yes, does it contribute to the tasks? In this work, without heavy dependence on complex assumptions, we discover naturally emergent replay under task-optimized paradigm using a recurrent neural network-based reinforcement learning model, which mimics the hippocampus and prefrontal cortex, as well as their intercommunication and the sensory cortex input. The emergent replay in the hippocampus, which results from the episodic memory and cognitive map as well as environment observations, well resembles animal experimental data and serves as an effective indicator of high task performance. The model also successfully reproduces local and nonlocal replay, which matches the human experimental data. Our work provides a new avenue for understanding the mechanisms behind replay.
- [567] arXiv:2402.01476 (cross-list from cs.LG) [ pdf , ps , html , other ]
-
Title: Self-Attention through Kernel-Eigen Pair Sparse Variational Gaussian ProcessesComments: We propose Kernel-Eigen Pair Sparse Variational Gaussian Processes (KEP-SVGP) for building uncertainty-aware self-attention where the asymmetry of attention kernel is tackled by KSVD and a reduced time complexity is acquiredSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)
Abstract: While the great capability of Transformers significantly boosts prediction accuracy, it could also yield overconfident predictions and require calibrated uncertainty estimation, which can be commonly tackled by Gaussian processes (GPs). Existing works apply GPs with symmetric kernels under variational inference to the attention kernel; however, omitting the fact that attention kernels are in essence asymmetric. Moreover, the complexity of deriving the GP posteriors remains high for large-scale data. In this work, we propose Kernel-Eigen Pair Sparse Variational Gaussian Processes (KEP-SVGP) for building uncertainty-aware self-attention where the asymmetry of attention kernels is tackled by Kernel SVD (KSVD) and a reduced complexity is acquired. Through KEP-SVGP, i) the SVGP pair induced by the two sets of singular vectors from KSVD w.r.t. the attention kernel fully characterizes the asymmetry; ii) using only a small set of adjoint eigenfunctions from KSVD, the derivation of SVGP posteriors can be based on the inversion of a diagonal matrix containing singular values, contributing to a reduction in time complexity; iii) an evidence lower bound is derived so that variational parameters can be optimized towards this objective. Experiments verify our excellent performances and efficiency on in-distribution, distribution-shift and out-of-distribution benchmarks.
- [568] arXiv:2402.01481 (cross-list from cs.LG) [ pdf , ps , html , other ]
-
Title: Multi-level protein pre-training with Vabs-NetSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Biomolecules (q-bio.BM)
Abstract: In recent years, there has been a surge in the development of 3D structure-based pre-trained protein models, representing a significant advancement over pre-trained protein language models in various downstream tasks. However, most existing structure-based pre-trained models primarily focus on the residue level, i.e., alpha carbon atoms, while ignoring other atoms like side chain atoms. We argue that modeling proteins at both residue and atom levels is important since the side chain atoms can also be crucial for numerous downstream tasks, for example, molecular docking. Nevertheless, we find that naively combining residue and atom information during pre-training typically fails. We identify a key reason is the information leakage caused by the inclusion of atom structure in the input, which renders residue-level pre-training tasks trivial and results in insufficiently expressive residue representations. To address this issue, we introduce a span mask pre-training strategy on 3D protein chains to learn meaningful representations of both residues and atoms. This leads to a simple yet effective approach to learning protein representation suitable for diverse downstream tasks. Extensive experimental results on binding site prediction and function prediction tasks demonstrate our proposed pre-training approach significantly outperforms other methods. Our code will be made public.
- [569] arXiv:2402.01515 (cross-list from cs.LG) [ pdf , ps , html , other ]
-
Title: Enhancing Stochastic Gradient Descent: A Unified Framework and Novel Acceleration Methods for Faster ConvergenceSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Optimization and Control (math.OC)
Abstract: Based on SGD, previous works have proposed many algorithms that have improved convergence speed and generalization in stochastic optimization, such as SGDm, AdaGrad, Adam, etc. However, their convergence analysis under non-convex conditions is challenging. In this work, we propose a unified framework to address this issue. For any first-order methods, we interpret the updated direction $g_t$ as the sum of the stochastic subgradient $\nabla f_t(x_t)$ and an additional acceleration term $\frac{2|\langle v_t, \nabla f_t(x_t) \rangle|}{\|v_t\|_2^2} v_t$, thus we can discuss the convergence by analyzing $\langle v_t, \nabla f_t(x_t) \rangle$. Through our framework, we have discovered two plug-and-play acceleration methods: \textbf{Reject Accelerating} and \textbf{Random Vector Accelerating}, we theoretically demonstrate that these two methods can directly lead to an improvement in convergence rate.
- [570] arXiv:2402.01521 (cross-list from cs.CL) [ pdf , ps , other ]
-
Title: K-Level Reasoning with Large Language ModelsSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI)
Abstract: While Large Language Models (LLMs) have demonstrated their proficiency in complex reasoning tasks, their performance in dynamic, interactive, and competitive scenarios - such as business strategy and stock market analysis - remains underexplored. To bridge this gap, we formally explore the dynamic reasoning capabilities of LLMs for decision-making in rapidly evolving environments. We introduce two game theory-based pilot challenges that mirror the complexities of real-world dynamic decision-making. These challenges are well-defined, enabling clear, controllable, and precise evaluation of LLMs' dynamic reasoning abilities. Through extensive experiments, we find that existing reasoning methods tend to falter in dynamic settings that require k-level thinking - a key concept not tackled by previous works. To address this, we propose a novel reasoning approach for LLMs, named "K-Level Reasoning". This approach adopts the perspective of rivals to recursively employ k-level thinking based on available historical information, which significantly improves the prediction accuracy of rivals' subsequent moves and informs more strategic decision-making. This research not only sets a robust quantitative benchmark for the assessment of dynamic reasoning but also markedly enhances the proficiency of LLMs in dynamic contexts.
- [571] arXiv:2402.01535 (cross-list from cs.CL) [ pdf , ps , other ]
-
Title: An Empirical Analysis of Diversity in Argument SummarizationComments: Accepted at EACL2024 (main proceedings)Subjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI)
Abstract: Presenting high-level arguments is a crucial task for fostering participation in online societal discussions. Current argument summarization approaches miss an important facet of this task -- capturing diversity -- which is important for accommodating multiple perspectives. We introduce three aspects of diversity: those of opinions, annotators, and sources. We evaluate approaches to a popular argument summarization task called Key Point Analysis, which shows how these approaches struggle to (1) represent arguments shared by few people, (2) deal with data from various sources, and (3) align with subjectivity in human-provided annotations. We find that both general-purpose LLMs and dedicated KPA models exhibit this behavior, but have complementary strengths. Further, we observe that diversification of training data may ameliorate generalization. Addressing diversity in argument summarization requires a mix of strategies to deal with subjectivity.
- [572] arXiv:2402.01536 (cross-list from cs.HC) [ pdf , ps , html , other ]
-
Title: Homogenization Effects of Large Language Models on Human Creative IdeationComments: 20 pages, 7 figuresSubjects: Human-Computer Interaction (cs.HC) ; Artificial Intelligence (cs.AI)
Abstract: Large language models (LLMs) are now being used in a wide variety of contexts, including as creativity support tools (CSTs) intended to help their users come up with new ideas. But do LLMs actually support user creativity? We hypothesized that the use of an LLM as a CST might make the LLM's users feel more creative, and even broaden the range of ideas suggested by each individual user, but also homogenize the ideas suggested by different users. We conducted a 36-participant comparative user study and found, in accordance with the homogenization hypothesis, that different users tended to produce less semantically distinct ideas with ChatGPT than with an alternative CST. Additionally, ChatGPT users generated a greater number of more detailed ideas, but felt less responsible for the ideas they generated. We discuss potential implications of these findings for users, designers, and developers of LLM-based CSTs.
- [573] arXiv:2402.01537 (cross-list from cs.CV) [ pdf , ps , other ]
-
Title: Closing the Gap in Human Behavior Analysis: A Pipeline for Synthesizing Trimodal DataSubjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: In pervasive machine learning, especially in Human Behavior Analysis (HBA), RGB has been the primary modality due to its accessibility and richness of information. However, linked with its benefits are challenges, including sensitivity to lighting conditions and privacy concerns. One possibility to overcome these vulnerabilities is to resort to different modalities. For instance, thermal is particularly adept at accentuating human forms, while depth adds crucial contextual layers. Despite their known benefits, only a few HBA-specific datasets that integrate these modalities exist. To address this shortage, our research introduces a novel generative technique for creating trimodal, i.e., RGB, thermal, and depth, human-focused datasets. This technique capitalizes on human segmentation masks derived from RGB images, combined with thermal and depth backgrounds that are sourced automatically. With these two ingredients, we synthesize depth and thermal counterparts from existing RGB data utilizing conditional image-to-image translation. By employing this approach, we generate trimodal data that can be leveraged to train models for settings with limited data, bad lightning conditions, or privacy-sensitive areas.
- [574] arXiv:2402.01546 (cross-list from cs.LG) [ pdf , ps , other ]
-
Title: Privacy-Preserving Distributed Learning for Residential Short-Term Load ForecastingSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Cryptography and Security (cs.CR); Distributed, Parallel, and Cluster Computing (cs.DC); Multiagent Systems (cs.MA); Systems and Control (eess.SY)
Abstract: In the realm of power systems, the increasing involvement of residential users in load forecasting applications has heightened concerns about data privacy. Specifically, the load data can inadvertently reveal the daily routines of residential users, thereby posing a risk to their property security. While federated learning (FL) has been employed to safeguard user privacy by enabling model training without the exchange of raw data, these FL models have shown vulnerabilities to emerging attack techniques, such as Deep Leakage from Gradients and poisoning attacks. To counteract these, we initially employ a Secure-Aggregation (SecAgg) algorithm that leverages multiparty computation cryptographic techniques to mitigate the risk of gradient leakage. However, the introduction of SecAgg necessitates the deployment of additional sub-center servers for executing the multiparty computation protocol, thereby escalating computational complexity and reducing system robustness, especially in scenarios where one or more sub-centers are unavailable. To address these challenges, we introduce a Markovian Switching-based distributed training framework, the convergence of which is substantiated through rigorous theoretical analysis. The Distributed Markovian Switching (DMS) topology shows strong robustness towards the poisoning attacks as well. Case studies employing real-world power system load data validate the efficacy of our proposed algorithm. It not only significantly minimizes communication complexity but also maintains accuracy levels comparable to traditional FL methods, thereby enhancing the scalability of our load forecasting algorithm.
- [575] arXiv:2402.01566 (cross-list from cs.CV) [ pdf , ps , html , other ]
-
Title: Boximator: Generating Rich and Controllable Motions for Video SynthesisComments: 16 pages, 9 figuresSubjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI)
Abstract: Generating rich and controllable motion is a pivotal challenge in video synthesis. We propose Boximator, a new approach for fine-grained motion control. Boximator introduces two constraint types: hard box and soft box. Users select objects in the conditional frame using hard boxes and then use either type of boxes to roughly or rigorously define the object's position, shape, or motion path in future frames. Boximator functions as a plug-in for existing video diffusion models. Its training process preserves the base model's knowledge by freezing the original weights and training only the control module. To address training challenges, we introduce a novel self-tracking technique that greatly simplifies the learning of box-object correlations. Empirically, Boximator achieves state-of-the-art video quality (FVD) scores, improving on two base models, and further enhanced after incorporating box constraints. Its robust motion controllability is validated by drastic increases in the bounding box alignment metric. Human evaluation also shows that users favor Boximator generation results over the base model.
- [576] arXiv:2402.01580 (cross-list from cs.CY) [ pdf , ps , html , other ]
-
Title: Generative AI for Education (GAIED): Advances, Opportunities, and ChallengesPaul Denny , Sumit Gulwani , Neil T. Heffernan , Tanja Käser , Steven Moore , Anna N. Rafferty , Adish SinglaSubjects: Computers and Society (cs.CY) ; Artificial Intelligence (cs.AI)
Abstract: This survey article has grown out of the GAIED (pronounced "guide") workshop organized by the authors at the NeurIPS 2023 conference. We organized the GAIED workshop as part of a community-building effort to bring together researchers, educators, and practitioners to explore the potential of generative AI for enhancing education. This article aims to provide an overview of the workshop activities and highlight several future research directions in the area of GAIED.
- [577] arXiv:2402.01586 (cross-list from cs.CL) [ pdf , ps , other ]
-
Title: TrustAgent: Towards Safe and Trustworthy LLM-based Agents through Agent ConstitutionComments: 16 pages, 3 figures, 5 tables, comments and suggestions are welcomeSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Multiagent Systems (cs.MA)
Abstract: The emergence of LLM-based agents has garnered considerable attention, yet their trustworthiness remains an under-explored area. As agents can directly interact with the physical environment, their reliability and safety is critical. This paper presents an Agent-Constitution-based agent framework, TrustAgent, an initial investigation into improving the safety dimension of trustworthiness in LLM-based agents. This framework consists of threefold strategies: pre-planning strategy which injects safety knowledge to the model prior to plan generation, in-planning strategy which bolsters safety during plan generation, and post-planning strategy which ensures safety by post-planning inspection. Through experimental analysis, we demonstrate how these approaches can effectively elevate an LLM agent's safety by identifying and preventing potential dangers. Furthermore, we explore the intricate relationships between safety and helpfulness, and between the model's reasoning ability and its efficacy as a safe agent. This paper underscores the imperative of integrating safety awareness and trustworthiness into the design and deployment of LLM-based agents, not only to enhance their performance but also to ensure their responsible integration into human-centric environments. Data and code are available at this https URL .
- [578] arXiv:2402.01591 (cross-list from eess.AS) [ pdf , ps , other ]
-
Title: BAT: Learning to Reason about Spatial Sounds with Large Language ModelsComments: Preprint, work in progressSubjects: Audio and Speech Processing (eess.AS) ; Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Sound (cs.SD)
Abstract: Spatial sound reasoning is a fundamental human skill, enabling us to navigate and interpret our surroundings based on sound. In this paper we present BAT, which combines the spatial sound perception ability of a binaural acoustic scene analysis model with the natural language reasoning capabilities of a large language model (LLM) to replicate this innate ability. To address the lack of existing datasets of in-the-wild spatial sounds, we synthesized a binaural audio dataset using AudioSet and SoundSpaces 2.0. Next, we developed SpatialSoundQA, a spatial sound-based question-answering dataset, offering a range of QA tasks that train BAT in various aspects of spatial sound perception and reasoning. The acoustic front end encoder of BAT is a novel spatial audio encoder named Spatial Audio Spectrogram Transformer, or Spatial-AST, which by itself achieves strong performance across sound event detection, spatial localization, and distance estimation. By integrating Spatial-AST with LLaMA-2 7B model, BAT transcends standard Sound Event Localization and Detection (SELD) tasks, enabling the model to reason about the relationships between the sounds in its environment. Our experiments demonstrate BAT's superior performance on both spatial sound perception and reasoning, showcasing the immense potential of LLMs in navigating and interpreting complex spatial audio environments.
- [579] arXiv:2402.01613 (cross-list from cs.CL) [ pdf , ps , other ]
-
Title: Nomic Embed: Training a Reproducible Long Context Text EmbedderSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI)
Abstract: This technical report describes the training of nomic-embed-text-v1, the first fully reproducible, open-source, open-weights, open-data, 8192 context length English text embedding model that outperforms both OpenAI Ada-002 and OpenAI text-embedding-3-small on short and long-context tasks. We release the training code and model weights under an Apache 2 license. In contrast with other open-source models, we release a training data loader with 235 million curated text pairs that allows for the full replication of nomic-embed-text-v1. You can find code and data to replicate the model at this https URL
- [580] arXiv:2402.01614 (cross-list from cs.LG) [ pdf , ps , other ]
-
Title: L2G2G: a Scalable Local-to-Global Network Embedding with Graph AutoencodersComments: 13 pages, 4 figures, Complex Networks 2023, Volume I, SCI 1141Subjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Social and Information Networks (cs.SI); Machine Learning (stat.ML)
Abstract: For analysing real-world networks, graph representation learning is a popular tool. These methods, such as a graph autoencoder (GAE), typically rely on low-dimensional representations, also called embeddings, which are obtained through minimising a loss function; these embeddings are used with a decoder for downstream tasks such as node classification and edge prediction. While GAEs tend to be fairly accurate, they suffer from scalability issues. For improved speed, a Local2Global approach, which combines graph patch embeddings based on eigenvector synchronisation, was shown to be fast and achieve good accuracy. Here we propose L2G2G, a Local2Global method which improves GAE accuracy without sacrificing scalability. This improvement is achieved by dynamically synchronising the latent node representations, while training the GAEs. It also benefits from the decoder computing an only local patch loss. Hence, aligning the local embeddings in each epoch utilises more information from the graph than a single post-training alignment does, while maintaining scalability. We illustrate on synthetic benchmarks, as well as real-world examples, that L2G2G achieves higher accuracy than the standard Local2Global approach and scales efficiently on the larger data sets. We find that for large and dense networks, it even outperforms the slow, but assumed more accurate, GAEs.
- [581] arXiv:2402.01643 (cross-list from cs.CL) [ pdf , ps , html , other ]
-
Title: L-TUNING: Synchronized Label Tuning for Prompt and Prefix in LLMsComments: Published in the ICLR TinyPaper trackSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: Efficiently fine-tuning Large Language Models (LLMs) for specific tasks presents a considerable challenge in natural language processing. Traditional methods, like prompt or prefix tuning, typically rely on arbitrary tokens for training, leading to prolonged training times and generalized token use across various class labels. To address these issues, this paper introduces L-Tuning, an efficient fine-tuning approach designed for classification tasks within the Natural Language Inference (NLI) framework. Diverging from conventional methods, L-Tuning focuses on the fine-tuning of label tokens processed through a pre-trained LLM, thereby harnessing its pre-existing semantic knowledge. This technique not only improves the fine-tuning accuracy and efficiency but also facilitates the generation of distinct label embeddings for each class, enhancing the model's training nuance. Our experimental results indicate a significant improvement in training efficiency and classification accuracy with L-Tuning compared to traditional approaches, marking a promising advancement in fine-tuning LLMs for complex language tasks.
- [582] arXiv:2402.01647 (cross-list from cs.CY) [ pdf , ps , html , other ]
-
Title: Build Your Own Robot Friend: An Open-Source Learning Module for Accessible and Engaging AI EducationZhonghao Shi , Allison O'Connell , Zongjian Li , Siqi Liu , Jennifer Ayissi , Guy Hoffman , Mohammad Soleymani , Maja J. MatarićComments: Accepted to the Proceedings of the AAAI Conference on Artificial Intelligence (2024)Subjects: Computers and Society (cs.CY) ; Artificial Intelligence (cs.AI); Human-Computer Interaction (cs.HC); Machine Learning (cs.LG); Robotics (cs.RO)
Abstract: As artificial intelligence (AI) is playing an increasingly important role in our society and global economy, AI education and literacy have become necessary components in college and K-12 education to prepare students for an AI-powered society. However, current AI curricula have not yet been made accessible and engaging enough for students and schools from all socio-economic backgrounds with different educational goals. In this work, we developed an open-source learning module for college and high school students, which allows students to build their own robot companion from the ground up. This open platform can be used to provide hands-on experience and introductory knowledge about various aspects of AI, including robotics, machine learning (ML), software engineering, and mechanical engineering. Because of the social and personal nature of a socially assistive robot companion, this module also puts a special emphasis on human-centered AI, enabling students to develop a better understanding of human-AI interaction and AI ethics through hands-on learning activities. With open-source documentation, assembling manuals and affordable materials, students from different socio-economic backgrounds can personalize their learning experience based on their individual educational goals. To evaluate the student-perceived quality of our module, we conducted a usability testing workshop with 15 college students recruited from a minority-serving institution. Our results indicate that our AI module is effective, easy-to-follow, and engaging, and it increases student interest in studying AI/ML and robotics in the future. We hope that this work will contribute toward accessible and engaging AI education in human-AI interaction for college and high school students.
- [583] arXiv:2402.01651 (cross-list from cs.CY) [ pdf , ps , html , other ]
-
Title: Informed AI Regulation: Comparing the Ethical Frameworks of Leading LLM Chatbots Using an Ethics-Based Audit to Assess Moral Reasoning and Normative ValuesComments: 23 pages, 6 figures (3 as tables), 1 table (in LaTeX)Subjects: Computers and Society (cs.CY) ; Artificial Intelligence (cs.AI)
Abstract: With the rise of individual and collaborative networks of autonomous agents, AI is deployed in more key reasoning and decision-making roles. For this reason, ethics-based audits play a pivotal role in the rapidly growing fields of AI safety and regulation. This paper undertakes an ethics-based audit to probe the 8 leading commercial and open-source Large Language Models including GPT-4. We assess explicability and trustworthiness by a) establishing how well different models engage in moral reasoning and b) comparing normative values underlying models as ethical frameworks. We employ an experimental, evidence-based approach that challenges the models with ethical dilemmas in order to probe human-AI alignment. The ethical scenarios are designed to require a decision in which the particulars of the situation may or may not necessitate deviating from normative ethical principles. A sophisticated ethical framework was consistently elicited in one model, GPT-4. Nonetheless, troubling findings include underlying normative frameworks with clear bias towards particular cultural norms. Many models also exhibit disturbing authoritarian tendencies. Code is available at this https URL .
- [584] arXiv:2402.01652 (cross-list from cs.CY) [ pdf , ps , html , other ]
-
Title: User-Centric AI Analytics for Chronic Health Conditions ManagementComments: Keynote talk at IEEE Conference on Intelligent Methods, Systems, and Applications (IMSA), Cairo, Egypt, July 2023Subjects: Computers and Society (cs.CY) ; Artificial Intelligence (cs.AI); Human-Computer Interaction (cs.HC); Machine Learning (cs.LG)
Abstract: The use of AI analytics in health informatics has seen a rapid growth in recent years. In this talk, we look at AI analytics use in managing chronic health conditions such as diabetes, obesity, etc. We focus on the challenges in managing these conditions especially with drug-free approaches due to the variations in individual circumstances. These variations directed the research into user-centric approach leading to variety of research questions. In this short paper, we give examples from recent and current research work and conclude with what, in our opinion, to be the next steps and some remaining open research questions.
- [585] arXiv:2402.01654 (cross-list from eess.SP) [ pdf , ps , other ]
-
Title: A Scoping Review of Energy Load DisaggregationJournal-ref: Progress in Artificial Intelligence. EPIA 2023. Lecture Notes in Computer Science, vol 14116Subjects: Signal Processing (eess.SP) ; Artificial Intelligence (cs.AI); Computers and Society (cs.CY)
Abstract: Energy load disaggregation can contribute to balancing power grids by enhancing the effectiveness of demand-side management and promoting electricity-saving behavior through increased consumer awareness. However, the field currently lacks a comprehensive overview. To address this gap, this paper con-ducts a scoping review of load disaggregation domains, data types, and methods, by assessing 72 full-text journal articles. The findings reveal that domestic electricity consumption is the most researched area, while others, such as industrial load disaggregation, are rarely discussed. The majority of research uses relatively low-frequency data, sampled between 1 and 60 seconds. A wide variety of methods are used, and artificial neural networks are the most common, followed by optimization strategies, Hidden Markov Models, and Graph Signal Processing approaches.
- [586] arXiv:2402.01655 (cross-list from cs.CY) [ pdf , ps , html , other ]
-
Title: A Deep Learning Approach Towards Student Performance Prediction in Online Courses: Challenges Based on a Global PerspectiveAbdallah Moubayed , MohammadNoor Injadat , Nouh Alhindawi , Ghassan Samara , Sara Abuasal , Raed AlazaidahComments: Accepted and presented in 24th International Arab Conference on Information Technology (ACIT'2023)Subjects: Computers and Society (cs.CY) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: Analyzing and evaluating students' progress in any learning environment is stressful and time consuming if done using traditional analysis methods. This is further exasperated by the increasing number of students due to the shift of focus toward integrating the Internet technologies in education and the focus of academic institutions on moving toward e-Learning, blended, or online learning models. As a result, the topic of student performance prediction has become a vibrant research area in recent years. To address this, machine learning and data mining techniques have emerged as a viable solution. To that end, this work proposes the use of deep learning techniques (CNN and RNN-LSTM) to predict the students' performance at the midpoint stage of the online course delivery using three distinct datasets collected from three different regions of the world. Experimental results show that deep learning models have promising performance as they outperform other optimized traditional ML models in two of the three considered datasets while also having comparable performance for the third dataset.
- [587] arXiv:2402.01656 (cross-list from cs.CY) [ pdf , ps , html , other ]
-
Title: Promises and pitfalls of artificial intelligence for legal applicationsSubjects: Computers and Society (cs.CY) ; Artificial Intelligence (cs.AI)
Abstract: Is AI set to redefine the legal profession? We argue that this claim is not supported by the current evidence. We dive into AI's increasingly prevalent roles in three types of legal tasks: information processing; tasks involving creativity, reasoning, or judgment; and predictions about the future. We find that the ease of evaluating legal applications varies greatly across legal tasks, based on the ease of identifying correct answers and the observability of information relevant to the task at hand. Tasks that would lead to the most significant changes to the legal profession are also the ones most prone to overoptimism about AI capabilities, as they are harder to evaluate. We make recommendations for better evaluation and deployment of AI in legal contexts.
- [588] arXiv:2402.01659 (cross-list from cs.CY) [ pdf , ps , other ]
-
Title: Generative Artificial Intelligence in Higher Education: Evidence from an Analysis of Institutional Policies and GuidelinesSubjects: Computers and Society (cs.CY) ; Artificial Intelligence (cs.AI)
Abstract: The release of ChatGPT in November 2022 prompted a massive uptake of generative artificial intelligence (GenAI) across higher education institutions (HEIs). HEIs scrambled to respond to its use, especially by students, looking first to regulate it and then arguing for its productive integration within teaching and learning. In the year since the release, HEIs have increasingly provided policies and guidelines to direct GenAI. In this paper we examined documents produced by 116 US universities categorized as high research activity or R1 institutions to comprehensively understand GenAI related advice and guidance given to institutional stakeholders. Through an extensive analysis, we found the majority of universities (N=73, 63%) encourage the use of GenAI and many provide detailed guidance for its use in the classroom (N=48, 41%). More than half of all institutions provided sample syllabi (N=65, 56%) and half (N=58, 50%) provided sample GenAI curriculum and activities that would help instructors integrate and leverage GenAI in their classroom. Notably, most guidance for activities focused on writing, whereas code and STEM-related activities were mentioned half the time and vaguely even when they were (N=58, 50%). Finally, more than one half of institutions talked about the ethics of GenAI on a range of topics broadly, including Diversity, Equity and Inclusion (DEI) (N=60, 52%). Overall, based on our findings we caution that guidance for faculty can become burdensome as extensive revision of pedagogical approaches is often recommended in the policies.
- [589] arXiv:2402.01662 (cross-list from cs.CY) [ pdf , ps , html , other ]
-
Title: Generative Ghosts: Anticipating Benefits and Risks of AI AfterlivesComments: version 2, updated May 8, 2024 to included updated references and new case study pointers as the trend of generative ghosts acceleratesSubjects: Computers and Society (cs.CY) ; Artificial Intelligence (cs.AI)
Abstract: As AI systems quickly improve in both breadth and depth of performance, they lend themselves to creating increasingly powerful and realistic agents, including the possibility of agents modeled on specific people. We anticipate that within our lifetimes it may become common practice for people to create a custom AI agent to interact with loved ones and/or the broader world after death. We call these generative ghosts, since such agents will be capable of generating novel content rather than merely parroting content produced by their creator while living. In this paper, we first discuss the design space of potential implementations of generative ghosts. We then discuss the practical and ethical implications of generative ghosts, including potential positive and negative impacts on individuals and society. Based on these considerations, we lay out a research agenda for the AI and HCI research communities to empower people to create and interact with AI afterlives in a safe and beneficial manner.
- [590] arXiv:2402.01668 (cross-list from cs.CY) [ pdf , ps , html , other ]
-
Title: Determining the Difficulties of Students With Dyslexia via Virtual Reality and Artificial Intelligence: An Exploratory AnalysisEnrique Yeguas-Bolívar , José M. Alcalde-Llergo , Pilar Aparicio-Martínez , Juri Taborri , Andrea Zingoni , Sara PinziComments: 7 pages, 5 figures, 3 tables, MetroXRAINE 2022 Conference, VRAILEXIA european projectJournal-ref: 2022 IEEE International Conference on Metrology for Extended Reality, Artificial Intelligence and Neural Engineering (MetroXRAINE), Rome, Italy, 2022, pp. 585-590Subjects: Computers and Society (cs.CY) ; Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR); Human-Computer Interaction (cs.HC)
Abstract: Learning disorders are neurological conditions that affect the brain's ability to interconnect communication areas. Dyslexic students experience problems with reading, memorizing, and exposing concepts; however the magnitude of these can be mitigated through both therapies and the creation of compensatory mechanisms. Several efforts have been made to mitigate these issues, leading to the creation of digital resources for students with specific learning disorders attending primary and secondary education levels. Conversely, a standard approach is still missed in higher education. The VRAIlexia project has been created to tackle this issue by proposing two different tools: a mobile application integrating virtual reality (VR) to collect data quickly and easily, and an artificial intelligencebased software (AI) to analyze the collected data for customizing the supporting methodology for each student. The first one has been created and is being distributed among dyslexic students in Higher Education Institutions, for the conduction of specific psychological and psychometric tests. The second tool applies specific artificial intelligence algorithms to the data gathered via the application and other surveys. These AI techniques have allowed us to identify the most relevant difficulties faced by the students' cohort. Our different models have obtained around 90\% mean accuracy for predicting the support tools and learning strategies.
- [591] arXiv:2402.01669 (cross-list from cs.CY) [ pdf , ps , html , other ]
-
Title: Improved Performances and Motivation in Intelligent Tutoring Systems: Combining Machine Learning and Learner ChoiceBenjamin Clément (1 adn 3), Hélène Sauzéon (1 and 2), Didier Roy (1), Pierre-Yves Oudeyer (1) ((1) Inria FLOWERS team Talence France, (2) Université de Bordeaux BPH lab Bordeaux France, (3) EvidenceB Paris France)Comments: 29 pages, 37 figuresSubjects: Computers and Society (cs.CY) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: Large class sizes pose challenges to personalized learning in schools, which educational technologies, especially intelligent tutoring systems (ITS), aim to address. In this context, the ZPDES algorithm, based on the Learning Progress Hypothesis (LPH) and multi-armed bandit machine learning techniques, sequences exercises that maximize learning progress (LP). This algorithm was previously shown in field studies to boost learning performances for a wider diversity of students compared to a hand-designed curriculum. However, its motivational impact was not assessed. Also, ZPDES did not allow students to express choices. This limitation in agency is at odds with the LPH theory concerned with modeling curiosity-driven learning. We here study how the introduction of such choice possibilities impact both learning efficiency and motivation. The given choice concerns dimensions that are orthogonal to exercise difficulty, acting as a playful feature.
In an extensive field study (265 7-8 years old children, RCT design), we compare systems based either on ZPDES or a hand-designed curriculum, both with and without self-choice. We first show that ZPDES improves learning performance and produces a positive and motivating learning experience. We then show that the addition of choice triggers intrinsic motivation and reinforces the learning effectiveness of the LP-based personalization. In doing so, it strengthens the links between intrinsic motivation and performance progress during the serious game. Conversely, deleterious effects of the playful feature are observed for hand-designed linear paths. Thus, the intrinsic motivation elicited by a playful feature is beneficial only if the curriculum personalization is effective for the learner. Such a result deserves great attention due to increased use of playful features in non adaptive educational technologies. - [592] arXiv:2402.01672 (cross-list from cs.CY) [ pdf , ps , html , other ]
-
Title: Prerequisite Structure Discovery in Intelligent Tutoring SystemsLouis Annabi (Flowers, U2IS), Sao Mai NguyenJournal-ref: 2023 IEEE International Conference on Development and Learning (ICDL), Nov 2023, Macau, China. pp.176-181Subjects: Computers and Society (cs.CY) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: This paper addresses the importance of Knowledge Structure (KS) and Knowledge Tracing (KT) in improving the recommendation of educational content in intelligent tutoring systems. The KS represents the relations between different Knowledge Components (KCs), while KT predicts a learner's success based on her past history. The contribution of this research includes proposing a KT model that incorporates the KS as a learnable parameter, enabling the discovery of the underlying KS from learner trajectories. The quality of the uncovered KS is assessed by using it to recommend content and evaluating the recommendation algorithm with simulated students.
- [593] arXiv:2402.01673 (cross-list from cs.CY) [ pdf , ps , other ]
-
Title: Legal and ethical implications of applications based on agreement technologies: the case of auction-based road intersectionsJosé-Antonio Santos , Alberto Fernández , Mar Moreno-Rebato , Holger Billhardt , José-A. Rodríguez-García , Sascha OssowskiJournal-ref: Artif. Intell. Law 28(4): 385-414 (2020)Subjects: Computers and Society (cs.CY) ; Artificial Intelligence (cs.AI)
Abstract: Agreement Technologies refer to a novel paradigm for the construction of distributed intelligent systems, where autonomous software agents negotiate to reach agreements on behalf of their human users. Smart Cities are a key application domain for Agreement Technologies. While several proofs of concept and prototypes exist, such systems are still far from ready for being deployed in the real-world. In this paper we focus on a novel method for managing elements of smart road infrastructures of the future, namely the case of auction-based road intersections. We show that, even though the key technological elements for such methods are already available, there are multiple non-technical issues that need to be tackled before they can be applied in practice. For this purpose, we analyse legal and ethical implications of auction-based road intersections in the context of international regulations and from the standpoint of the Spanish legislation. From this exercise, we extract a set of required modifications, of both technical and legal nature, which need to be addressed so as to pave the way for the potential real-world deployment of such systems in a future that may not be too far away.
- [594] arXiv:2402.01676 (cross-list from cs.CL) [ pdf , ps , html , other ]
-
Title: Language models align with human judgments on key grammatical constructionsComments: Response to Dentella et al. (2023)Subjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI)
Abstract: Do Large Language Models (LLMs) make human-like linguistic generalizations? Dentella et al. (2023; "DGL") prompt several LLMs ("Is the following sentence grammatically correct in English?") to elicit grammaticality judgments of 80 English sentences, concluding that LLMs demonstrate a "yes-response bias" and a "failure to distinguish grammatical from ungrammatical sentences". We re-evaluate LLM performance using well-established practices and find that DGL's data in fact provide evidence for just how well LLMs capture human behaviors. Models not only achieve high accuracy overall, but also capture fine-grained variation in human linguistic judgments.
- [595] arXiv:2402.01679 (cross-list from cs.CL) [ pdf , ps , html , other ]
-
Title: STICKERCONV: Generating Multimodal Empathetic Responses from ScratchYiqun Zhang , Fanheng Kong , Peidong Wang , Shuang Sun , Lingshuai Wang , Shi Feng , Daling Wang , Yifei Zhang , Kaisong SongSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI)
Abstract: Stickers, while widely recognized for enhancing empathetic communication in online interactions, remain underexplored in current empathetic dialogue research, notably due to the challenge of a lack of comprehensive datasets. In this paper, we introduce the Agent for STICKERCONV (Agent4SC), which uses collaborative agent interactions to realistically simulate human behavior with sticker usage, thereby enhancing multimodal empathetic communication. Building on this foundation, we develop a multimodal empathetic dialogue dataset, STICKERCONV, comprising 12.9K dialogue sessions, 5.8K unique stickers, and 2K diverse conversational scenarios. This dataset serves as a benchmark for multimodal empathetic generation. To advance further, we propose PErceive and Generate Stickers (PEGS), a multimodal empathetic response generation framework, complemented by a comprehensive set of empathy evaluation metrics based on LLM. Our experiments demonstrate PEGS's effectiveness in generating contextually relevant and emotionally resonant multimodal empathetic responses, contributing to the advancement of more nuanced and engaging empathetic dialogue systems.
- [596] arXiv:2402.01680 (cross-list from cs.CL) [ pdf , ps , html , other ]
-
Title: Large Language Model based Multi-Agents: A Survey of Progress and ChallengesTaicheng Guo , Xiuying Chen , Yaqi Wang , Ruidi Chang , Shichao Pei , Nitesh V. Chawla , Olaf Wiest , Xiangliang ZhangComments: This work is ongoing and we welcome your contribution!Subjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI); Multiagent Systems (cs.MA)
Abstract: Large Language Models (LLMs) have achieved remarkable success across a wide array of tasks. Due to the impressive planning and reasoning abilities of LLMs, they have been used as autonomous agents to do many tasks automatically. Recently, based on the development of using one LLM as a single planning or decision-making agent, LLM-based multi-agent systems have achieved considerable progress in complex problem-solving and world simulation. To provide the community with an overview of this dynamic field, we present this survey to offer an in-depth discussion on the essential aspects of multi-agent systems based on LLMs, as well as the challenges. Our goal is for readers to gain substantial insights on the following questions: What domains and environments do LLM-based multi-agents simulate? How are these agents profiled and how do they communicate? What mechanisms contribute to the growth of agents' capacities? For those interested in delving into this field of study, we also summarize the commonly used datasets or benchmarks for them to have convenient access. To keep researchers updated on the latest studies, we maintain an open-source GitHub repository, dedicated to outlining the research on LLM-based multi-agent systems.
- [597] arXiv:2402.01681 (cross-list from cs.CL) [ pdf , ps , html , other ]
-
Title: Emojis Decoded: Leveraging ChatGPT for Enhanced Understanding in Social Media CommunicationsComments: 12 pages, 2 page appendixSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI)
Abstract: Emojis, which encapsulate semantics beyond mere words or phrases, have become prevalent in social network communications. This has spurred increasing scholarly interest in exploring their attributes and functionalities. However, emoji-related research and application face two primary challenges. First, researchers typically rely on crowd-sourcing to annotate emojis in order to understand their sentiments, usage intentions, and semantic meanings. Second, subjective interpretations by users can often lead to misunderstandings of emojis and cause the communication barrier. Large Language Models (LLMs) have achieved significant success in various annotation tasks, with ChatGPT demonstrating expertise across multiple domains. In our study, we assess ChatGPT's effectiveness in handling previously annotated and downstream tasks. Our objective is to validate the hypothesis that ChatGPT can serve as a viable alternative to human annotators in emoji research and that its ability to explain emoji meanings can enhance clarity and transparency in online communications. Our findings indicate that ChatGPT has extensive knowledge of emojis. It is adept at elucidating the meaning of emojis across various application scenarios and demonstrates the potential to replace human annotators in a range of tasks.
- [598] arXiv:2402.01684 (cross-list from cs.CL) [ pdf , ps , html , other ]
-
Title: A Framework to Implement 1+N Multi-task Fine-tuning Pattern in LLMs Using the CGC-LORA AlgorithmSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: With the productive evolution of large language models (LLMs) in the field of natural language processing (NLP), tons of effort has been made to effectively fine-tune common pre-trained LLMs to fulfill a variety of tasks in one or multiple specific domain. In practice, there are two prevailing ways, in which the adaptation can be achieved: (i) Multiple Independent Models: Pre-trained LLMs are fine-tuned a few times independently using the corresponding training samples from each task. (ii) An Integrated Model: Samples from all tasks are employed to fine-tune a pre-trianed LLM unitedly. To address the high computing cost and seesawing issue simultaneously, we propose a unified framework that implements a 1 + N mutli-task fine-tuning pattern in LLMs using a novel Customized Gate Control (CGC) Low-rank Adaptation (LoRA) algorithm. Our work aims to take an advantage of both MTL (i.e., CGC) and PEFT (i.e., LoRA) scheme. For a given cluster of tasks, we design an innovative layer that contains two types of experts as additional trainable parameters to make LoRA be compatible with MTL. To comprehensively evaluate the proposed framework, we conduct well-designed experiments on two public datasets. The experimental results demonstrate that the unified framework with CGC-LoRA modules achieves higher evaluation scores than all benchmarks on both two datasets.
- [599] arXiv:2402.01685 (cross-list from cs.CL) [ pdf , ps , other ]
-
Title: SMUTF: Schema Matching Using Generative Tags and Hybrid FeaturesSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI); Databases (cs.DB)
Abstract: We introduce SMUTF, a unique approach for large-scale tabular data schema matching (SM), which assumes that supervised learning does not affect performance in open-domain tasks, thereby enabling effective cross-domain matching. This system uniquely combines rule-based feature engineering, pre-trained language models, and generative large language models. In an innovative adaptation inspired by the Humanitarian Exchange Language, we deploy 'generative tags' for each data column, enhancing the effectiveness of SM. SMUTF exhibits extensive versatility, working seamlessly with any pre-existing pre-trained embeddings, classification methods, and generative models.
Recognizing the lack of extensive, publicly available datasets for SM, we have created and open-sourced the HDXSM dataset from the public humanitarian data. We believe this to be the most exhaustive SM dataset currently available. In evaluations across various public datasets and the novel HDXSM dataset, SMUTF demonstrated exceptional performance, surpassing existing state-of-the-art models in terms of accuracy and efficiency, and} improving the F1 score by 11.84% and the AUC of ROC by 5.08%. - [600] arXiv:2402.01686 (cross-list from cs.CY) [ pdf , ps , html , other ]
-
Title: A Systematic Mapping Study of Digital Twins for Diagnosis in TransportationLiliana Marie Prikler , Franz Wotawa (Graz University of Technology, Institute for Software Technology)Journal-ref: 2023 10th International Conference on Dependable Systems and Their Applications (DSA), Tokyo, Japan, 2023, pp. 431-442Subjects: Computers and Society (cs.CY) ; Artificial Intelligence (cs.AI); Human-Computer Interaction (cs.HC)
Abstract: In recent years, digital twins have been proposed and implemented in various fields with potential applications ranging from prototyping to maintenance. Going forward, they are to enable numerous efficient and sustainable technologies, among them autonomous cars. However, despite a large body of research in many fields, academics have yet to agree on what exactly a digital twin is -- and as a result, what its capabilities and limitations might be. To further our understanding, we explore the capabilities of digital twins concerning diagnosis in the field of transportation. We conduct a systematic mapping study including digital twins of vehicles and their components, as well as transportation infrastructure. We discovered that few papers on digital twins describe any diagnostic process. Furthermore, most existing approaches appear limited to system monitoring or fault detection. These findings suggest that we need more research for diagnostic reasoning utilizing digital twins.
- [601] arXiv:2402.01688 (cross-list from cs.CY) [ pdf , ps , html , other ]
-
Title: An Online Hierarchical Energy Management System for Energy Communities, Complying with the Current Technical Legislation FrameworkComments: 26 pages, 18 figuresSubjects: Computers and Society (cs.CY) ; Artificial Intelligence (cs.AI)
Abstract: Efforts in the fight against Climate Change are increasingly oriented towards new energy efficiency strategies in Smart Grids (SGs). In 2018, with proper legislation, the European Union (EU) defined the Renewable Energy Community (REC) as a local electrical grid whose participants share their self-produced renewable energy, aiming at reducing bill costs by taking advantage of proper incentives. That action aspires to accelerate the spread of local renewable energy exploitation, whose costs could not be within everyone's reach. Since a REC is technically an SG, the strategies above can be applied, and specifically, practical Energy Management Systems (EMSs) are required. Therefore, in this work, an online Hierarchical EMS (HEMS) is synthesized for REC cost minimization to evaluate its superiority over a local self-consumption approach. EU technical indications (as inherited from Italy) are diligently followed, aiming for results that are as realistic as possible. Power flows between REC nodes, or Microgrids (MGs) are optimized by taking Energy Storage Systems (ESSs) and PV plant costs, energy purchase costs, and REC incentives. A hybrid Fuzzy Inference System - Genetic Algorithm (FIS-GA) model is implemented with the GA encoding the FIS parameters. Power generation and consumption, which are the overall system input, are predicted by a LSTM trained on historical data. The proposed hierarchical model achieves good precision in short computation times and outperforms the self-consumption approach, leading to about 20% savings compared to the latter. In addition, the Explainable AI (XAI), which characterizes the model through the FIS, makes results more reliable thanks to an excellent human interpretation level. To finish, the HEMS is parametrized so that it is straightforward to switch to another Country's technical legislation framework.
- [602] arXiv:2402.01691 (cross-list from cs.CY) [ pdf , ps , other ]
-
Title: Investigating Algorithm Review Boards for Organizational Responsible Artificial Intelligence GovernanceSubjects: Computers and Society (cs.CY) ; Artificial Intelligence (cs.AI)
Abstract: Organizations including companies, nonprofits, governments, and academic institutions are increasingly developing, deploying, and utilizing artificial intelligence (AI) tools. Responsible AI (RAI) governance approaches at organizations have emerged as important mechanisms to address potential AI risks and harms. In this work, we interviewed 17 technical contributors across organization types (Academic, Government, Industry, Nonprofit) and sectors (Finance, Health, Tech, Other) about their experiences with internal RAI governance. Our findings illuminated the variety of organizational definitions of RAI and accompanying internal governance approaches. We summarized the first detailed findings on algorithm review boards (ARBs) and similar review committees in practice, including their membership, scope, and measures of success. We confirmed known robust model governance in finance sectors and revealed extensive algorithm and AI governance with ARB-like review boards in health sectors. Our findings contradict the idea that Institutional Review Boards alone are sufficient for algorithm governance and posit that ARBs are among the more impactful internal RAI governance approaches. Our results suggest that integration with existing internal regulatory approaches and leadership buy-in are among the most important attributes for success and that financial tensions are the greatest challenge to effective organizational RAI. We make a variety of suggestions for how organizational partners can learn from these findings when building their own internal RAI frameworks. We outline future directions for developing and measuring effectiveness of ARBs and other internal RAI governance approaches.
- [603] arXiv:2402.01693 (cross-list from cs.CL) [ pdf , ps , other ]
-
Title: Quality of Answers of Generative Large Language Models vs Peer Patients for Interpreting Lab Test Results for Lay Patients: Evaluation StudyZhe He , Balu Bhasuran , Qiao Jin , Shubo Tian , Karim Hanna , Cindy Shavor , Lisbeth Garcia Arguello , Patrick Murray , Zhiyong LuSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI)
Abstract: Lab results are often confusing and hard to understand. Large language models (LLMs) such as ChatGPT have opened a promising avenue for patients to get their questions answered. We aim to assess the feasibility of using LLMs to generate relevant, accurate, helpful, and unharmful responses to lab test-related questions asked by patients and to identify potential issues that can be mitigated with augmentation approaches. We first collected lab test results related question and answer data from Yahoo! Answers and selected 53 QA pairs for this study. Using the LangChain framework and ChatGPT web portal, we generated responses to the 53 questions from four LLMs including GPT-4, Meta LLaMA 2, MedAlpaca, and ORCA_mini. We first assessed the similarity of their answers using standard QA similarity-based evaluation metrics including ROUGE, BLEU, METEOR, BERTScore. We also utilized an LLM-based evaluator to judge whether a target model has higher quality in terms of relevance, correctness, helpfulness, and safety than the baseline model. Finally, we performed a manual evaluation with medical experts for all the responses to seven selected questions on the same four aspects. The results of Win Rate and medical expert evaluation both showed that GPT-4's responses achieved better scores than all the other LLM responses and human responses on all four aspects (relevance, correctness, helpfulness, and safety). However, LLM responses occasionally also suffer from a lack of interpretation in one's medical context, incorrect statements, and lack of references. We find that compared to other three LLMs and human answer from the Q&A website, GPT-4's responses are more accurate, helpful, relevant, and safer. However, there are cases which GPT-4 responses are inaccurate and not individualized. We identified a number of ways to improve the quality of LLM responses.
- [604] arXiv:2402.01694 (cross-list from cs.CL) [ pdf , ps , html , other ]
-
Title: ARGS: Alignment as Reward-Guided SearchComments: ICLR 2024Subjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: Aligning large language models with human objectives is paramount, yet common approaches including RLHF suffer from unstable and resource-intensive training. In response to this challenge, we introduce ARGS, Alignment as Reward-Guided Search, a novel framework that integrates alignment into the decoding process, eliminating the need for expensive RL training. By adjusting the model's probabilistic predictions using a reward signal, ARGS generates texts with semantic diversity while being aligned with human preferences, offering a promising and flexible solution for aligning language models. Notably, ARGS demonstrates consistent enhancements in average reward compared to baselines across diverse alignment tasks and various model dimensions. For example, under the same greedy-based decoding strategy, our method improves the average reward by 19.56% relative to the baseline and secures a preference or tie score of 64.33% in GPT-4 evaluation. We believe that our framework, emphasizing decoding-time alignment, paves the way for more responsive language models in the future. Code is publicly available at: \url{ this https URL }.
- [605] arXiv:2402.01695 (cross-list from cs.CL) [ pdf , ps , html , other ]
-
Title: Language-Guided World Models: A Model-Based Approach to AI ControlSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: Installing probabilistic world models into artificial agents opens an efficient channel for humans to communicate with and control these agents. In addition to updating agent policies, humans can modify their internal world models in order to influence their decisions. The challenge, however, is that currently existing world models are difficult for humans to adapt because they lack a natural communication interface. Aimed at addressing this shortcoming, we develop Language-Guided World Models (LWMs), which can capture environment dynamics by reading language descriptions. These models enhance agent communication efficiency, allowing humans to simultaneously alter their behavior on multiple tasks with concise language feedback. They also enable agents to self-learn from texts originally written to instruct humans. To facilitate the development of LWMs, we design a challenging benchmark based on the game of MESSENGER (Hanjie et al., 2021), requiring compositional generalization to new language descriptions and environment dynamics. Our experiments reveal that the current state-of-the-art Transformer architecture performs poorly on this benchmark, motivating us to design a more robust architecture. To showcase the practicality of our proposed LWMs, we simulate a scenario where these models augment the interpretability and safety of an agent by enabling it to generate and discuss plans with a human before execution. By effectively incorporating language feedback on the plan, the models boost the agent performance in the real environment by up to three times without collecting any interactive experiences in this environment.
- [606] arXiv:2402.01698 (cross-list from cs.CL) [ pdf , ps , html , other ]
-
Title: Large language model empowered participatory urban planningComments: 26 pages, 7 figures, 2 tablesSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI)
Abstract: Participatory urban planning is the mainstream of modern urban planning and involves the active engagement of different stakeholders. However, the traditional participatory paradigm encounters challenges in time and manpower, while the generative planning tools fail to provide adjustable and inclusive solutions. This research introduces an innovative urban planning approach integrating Large Language Models (LLMs) within the participatory process. The framework, based on the crafted LLM agent, consists of role-play, collaborative generation, and feedback iteration, solving a community-level land-use task catering to 1000 distinct interests. Empirical experiments in diverse urban communities exhibit LLM's adaptability and effectiveness across varied planning scenarios. The results were evaluated on four metrics, surpassing human experts in satisfaction and inclusion, and rivaling state-of-the-art reinforcement learning methods in service and ecology. Further analysis shows the advantage of LLM agents in providing adjustable and inclusive solutions with natural language reasoning and strong scalability. While implementing the recent advancements in emulating human behavior for planning, this work envisions both planners and citizens benefiting from low-cost, efficient LLM agents, which is crucial for enhancing participation and realizing participatory urban planning.
- [607] arXiv:2402.01700 (cross-list from cs.CL) [ pdf , ps , other ]
-
Title: Question answering systems for health professionals at the point of care -- a systematic reviewGregory Kell , Angus Roberts , Serge Umansky , Linglong Qian , Davide Ferrari , Frank Soboczenski , Byron Wallace , Nikhil Patel , Iain J MarshallComments: Accepted to the Journal of the American Medical Informatics Association (JAMIA)Subjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI)
Abstract: Objective: Question answering (QA) systems have the potential to improve the quality of clinical care by providing health professionals with the latest and most relevant evidence. However, QA systems have not been widely adopted. This systematic review aims to characterize current medical QA systems, assess their suitability for healthcare, and identify areas of improvement.
Materials and methods: We searched PubMed, IEEE Xplore, ACM Digital Library, ACL Anthology and forward and backward citations on 7th February 2023. We included peer-reviewed journal and conference papers describing the design and evaluation of biomedical QA systems. Two reviewers screened titles, abstracts, and full-text articles. We conducted a narrative synthesis and risk of bias assessment for each study. We assessed the utility of biomedical QA systems.
Results: We included 79 studies and identified themes, including question realism, answer reliability, answer utility, clinical specialism, systems, usability, and evaluation methods. Clinicians' questions used to train and evaluate QA systems were restricted to certain sources, types and complexity levels. No system communicated confidence levels in the answers or sources. Many studies suffered from high risks of bias and applicability concerns. Only 8 studies completely satisfied any criterion for clinical utility, and only 7 reported user evaluations. Most systems were built with limited input from clinicians.
Discussion: While machine learning methods have led to increased accuracy, most studies imperfectly reflected real-world healthcare information needs. Key research priorities include developing more realistic healthcare QA datasets and considering the reliability of answer sources, rather than merely focusing on accuracy. - [608] arXiv:2402.01702 (cross-list from cs.CL) [ pdf , ps , html , other ]
-
Title: Fluent dreaming for language modelsComments: 11 pages, 6 figures, 4 tablesSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI)
Abstract: Feature visualization, also known as "dreaming", offers insights into vision models by optimizing the inputs to maximize a neuron's activation or other internal component. However, dreaming has not been successfully applied to language models because the input space is discrete. We extend Greedy Coordinate Gradient, a method from the language model adversarial attack literature, to design the Evolutionary Prompt Optimization (EPO) algorithm. EPO optimizes the input prompt to simultaneously maximize the Pareto frontier between a chosen internal feature and prompt fluency, enabling fluent dreaming for language models. We demonstrate dreaming with neurons, output logits and arbitrary directions in activation space. We measure the fluency of the resulting prompts and compare language model dreaming with max-activating dataset examples. Critically, fluent dreaming allows automatically exploring the behavior of model internals in reaction to mildly out-of-distribution prompts. Code for running EPO is available at this https URL . A companion page demonstrating code usage is at this https URL
- [609] arXiv:2402.01703 (cross-list from cs.CY) [ pdf , ps , other ]
-
Title: A Multi-Perspective Machine Learning Approach to Evaluate Police-Driver Interaction in Los AngelesBenjamin A.T. Grahama , Lauren Brown , Georgios Chochlakis , Morteza Dehghani , Raquel Delerme , Brittany Friedman , Ellie Graeden , Preni Golazizian , Rajat Hebbar , Parsa Hejabi , Aditya Kommineni , Mayagüez Salinas , Michael Sierra-Arévalo , Jackson Trager , Nicholas Weller , Shrikanth NarayananComments: 13 pagesSubjects: Computers and Society (cs.CY) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
Abstract: Interactions between the government officials and civilians affect public wellbeing and the state legitimacy that is necessary for the functioning of democratic society. Police officers, the most visible and contacted agents of the state, interact with the public more than 20 million times a year during traffic stops. Today, these interactions are regularly recorded by body-worn cameras (BWCs), which are lauded as a means to enhance police accountability and improve police-public interactions. However, the timely analysis of these recordings is hampered by a lack of reliable automated tools that can enable the analysis of these complex and contested police-public interactions. This article proposes an approach to developing new multi-perspective, multimodal machine learning (ML) tools to analyze the audio, video, and transcript information from this BWC footage. Our approach begins by identifying the aspects of communication most salient to different stakeholders, including both community members and police officers. We move away from modeling approaches built around the existence of a single ground truth and instead utilize new advances in soft labeling to incorporate variation in how different observers perceive the same interactions. We argue that this inclusive approach to the conceptualization and design of new ML tools is broadly applicable to the study of communication and development of analytic tools across domains of human interaction, including education, medicine, and the workplace.
- [610] arXiv:2402.01704 (cross-list from cs.CL) [ pdf , ps , html , other ]
-
Title: States as Strings as Strategies: Steering Language Models with Game-Theoretic SolversIan Gemp , Yoram Bachrach , Marc Lanctot , Roma Patel , Vibhavari Dasagi , Luke Marris , Georgios Piliouras , Siqi Liu , Karl TuylsComments: 32 pages, 8 figures, code available @ this https URLSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI); Computer Science and Game Theory (cs.GT)
Abstract: Game theory is the study of mathematical models of strategic interactions among rational agents. Language is a key medium of interaction for humans, though it has historically proven difficult to model dialogue and its strategic motivations mathematically. A suitable model of the players, strategies, and payoffs associated with linguistic interactions (i.e., a binding to the conventional symbolic logic of game theory) would enable existing game-theoretic algorithms to provide strategic solutions in the space of language. In other words, a binding could provide a route to computing stable, rational conversational strategies in dialogue. Large language models (LLMs) have arguably reached a point where their generative capabilities can enable realistic, human-like simulations of natural dialogue. By prompting them in various ways, we can steer their responses towards different output utterances. Leveraging the expressivity of natural language, LLMs can also help us quickly generate new dialogue scenarios, which are grounded in real world applications. In this work, we present one possible binding from dialogue to game theory as well as generalizations of existing equilibrium finding algorithms to this setting. In addition, by exploiting LLMs generation capabilities along with our proposed binding, we can synthesize a large repository of formally-defined games in which one can study and test game-theoretic solution concepts. We also demonstrate how one can combine LLM-driven game generation, game-theoretic solvers, and imitation learning to construct a process for improving the strategic capabilities of LLMs.
- [611] arXiv:2402.01705 (cross-list from cs.CY) [ pdf , ps , html , other ]
-
Title: Beyond Behaviorist Representational Harms: A Plan for Measurement and MitigationComments: 23 pages, 7 figuresSubjects: Computers and Society (cs.CY) ; Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG)
Abstract: Algorithmic harms are commonly categorized as either allocative or representational. This study specifically addresses the latter, focusing on an examination of current definitions of representational harms to discern what is included and what is not. This analysis motivates our expansion beyond behavioral definitions to encompass harms to cognitive and affective states. The paper outlines high-level requirements for measurement: identifying the necessary expertise to implement this approach and illustrating it through a case study. Our work highlights the unique vulnerabilities of large language models to perpetrating representational harms, particularly when these harms go unmeasured and unmitigated. The work concludes by presenting proposed mitigations and delineating when to employ them. The overarching aim of this research is to establish a framework for broadening the definition of representational harms and to translate insights from fairness research into practical measurement and mitigation praxis.
- [612] arXiv:2402.01708 (cross-list from cs.CL) [ pdf , ps , html , other ]
-
Title: Not My Voice! A Taxonomy of Ethical and Safety Harms of Speech GeneratorsComments: 24 pages, 4 tables, 4 figuresSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI); Computers and Society (cs.CY); Audio and Speech Processing (eess.AS)
Abstract: The rapid and wide-scale adoption of AI to generate human speech poses a range of significant ethical and safety risks to society that need to be addressed. For example, a growing number of speech generation incidents are associated with swatting attacks in the United States, where anonymous perpetrators create synthetic voices that call police officers to close down schools and hospitals, or to violently gain access to innocent citizens' homes. Incidents like this demonstrate that multimodal generative AI risks and harms do not exist in isolation, but arise from the interactions of multiple stakeholders and technical AI systems. In this paper we analyse speech generation incidents to study how patterns of specific harms arise. We find that specific harms can be categorised according to the exposure of affected individuals, that is to say whether they are a subject of, interact with, suffer due to, or are excluded from speech generation systems. Similarly, specific harms are also a consequence of the motives of the creators and deployers of the systems. Based on these insights we propose a conceptual framework for modelling pathways to ethical and safety harms of AI, which we use to develop a taxonomy of harms of speech generators. Our relational approach captures the complexity of risks and harms in sociotechnical AI systems, and yields an extensible taxonomy that can support appropriate policy interventions and decision making for responsible multimodal model development and release of speech generators.
- [613] arXiv:2402.01711 (cross-list from cs.CY) [ pdf , ps , html , other ]
-
Title: LLM on FHIR -- Demystifying Health RecordsPaul Schmiedmayer , Adrit Rao , Philipp Zagar , Vishnu Ravi , Aydin Zahedivash , Arash Fereydooni , Oliver AalamiComments: Pre-print of the paper submitted to the Call for Papers for the Special Focus Issue on ChatGPT and Large Language Models (LLMs) in Biomedicine and Health at the Journal of the American Medical Informatics Association: this https URLSubjects: Computers and Society (cs.CY) ; Artificial Intelligence (cs.AI)
Abstract: Objective: To enhance health literacy and accessibility of health information for a diverse patient population by developing a patient-centered artificial intelligence (AI) solution using large language models (LLMs) and Fast Healthcare Interoperability Resources (FHIR) application programming interfaces (APIs). Materials and Methods: The research involved developing LLM on FHIR, an open-source mobile application allowing users to interact with their health records using LLMs. The app is built on Stanford's Spezi ecosystem and uses OpenAI's GPT-4. A pilot study was conducted with the SyntheticMass patient dataset and evaluated by medical experts to assess the app's effectiveness in increasing health literacy. The evaluation focused on the accuracy, relevance, and understandability of the LLM's responses to common patient questions. Results: LLM on FHIR demonstrated varying but generally high degrees of accuracy and relevance in providing understandable health information to patients. The app effectively translated medical data into patient-friendly language and was able to adapt its responses to different patient profiles. However, challenges included variability in LLM responses and the need for precise filtering of health data. Discussion and Conclusion: LLMs offer significant potential in improving health literacy and making health records more accessible. LLM on FHIR, as a pioneering application in this field, demonstrates the feasibility and challenges of integrating LLMs into patient care. While promising, the implementation and pilot also highlight risks such as inconsistent responses and the importance of replicable output. Future directions include better resource identification mechanisms and executing LLMs on-device to enhance privacy and reduce costs.
- [614] arXiv:2402.01712 (cross-list from cs.CL) [ pdf , ps , html , other ]
-
Title: Socially Aware Synthetic Data Generation for Suicidal Ideation Detection Using Large Language ModelsJournal-ref: IEEE AccessSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: Suicidal ideation detection is a vital research area that holds great potential for improving mental health support systems. However, the sensitivity surrounding suicide-related data poses challenges in accessing large-scale, annotated datasets necessary for training effective machine learning models. To address this limitation, we introduce an innovative strategy that leverages the capabilities of generative AI models, such as ChatGPT, Flan-T5, and Llama, to create synthetic data for suicidal ideation detection. Our data generation approach is grounded in social factors extracted from psychology literature and aims to ensure coverage of essential information related to suicidal ideation. In our study, we benchmarked against state-of-the-art NLP classification models, specifically, those centered around the BERT family structures. When trained on the real-world dataset, UMD, these conventional models tend to yield F1-scores ranging from 0.75 to 0.87. Our synthetic data-driven method, informed by social factors, offers consistent F1-scores of 0.82 for both models, suggesting that the richness of topics in synthetic data can bridge the performance gap across different model complexities. Most impressively, when we combined a mere 30% of the UMD dataset with our synthetic data, we witnessed a substantial increase in performance, achieving an F1-score of 0.88 on the UMD test set. Such results underscore the cost-effectiveness and potential of our approach in confronting major challenges in the field, such as data scarcity and the quest for diversity in data representation.
- [615] arXiv:2402.01713 (cross-list from cs.CL) [ pdf , ps , other ]
-
Title: Prompting Large Language Models for Zero-Shot Clinical Prediction with Structured Longitudinal Electronic Health Record DataYinghao Zhu , Zixiang Wang , Junyi Gao , Yuning Tong , Jingkun An , Weibin Liao , Ewen M. Harrison , Liantao Ma , Chengwei PanSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: The inherent complexity of structured longitudinal Electronic Health Records (EHR) data poses a significant challenge when integrated with Large Language Models (LLMs), which are traditionally tailored for natural language processing. Motivated by the urgent need for swift decision-making during new disease outbreaks, where traditional predictive models often fail due to a lack of historical data, this research investigates the adaptability of LLMs, like GPT-4, to EHR data. We particularly focus on their zero-shot capabilities, which enable them to make predictions in scenarios in which they haven't been explicitly trained. In response to the longitudinal, sparse, and knowledge-infused nature of EHR data, our prompting approach involves taking into account specific EHR characteristics such as units and reference ranges, and employing an in-context learning strategy that aligns with clinical contexts. Our comprehensive experiments on the MIMIC-IV and TJH datasets demonstrate that with our elaborately designed prompting framework, LLMs can improve prediction performance in key tasks such as mortality, length-of-stay, and 30-day readmission by about 35\%, surpassing ML models in few-shot settings. Our research underscores the potential of LLMs in enhancing clinical decision-making, especially in urgent healthcare situations like the outbreak of emerging diseases with no labeled data. The code is publicly available at this https URL for reproducibility.
- [616] arXiv:2402.01714 (cross-list from cs.CL) [ pdf , ps , html , other ]
-
Title: TrICy: Trigger-guided Data-to-text Generation with Intent aware Attention-CopyComments: Published in the IEEE/ACM Transactions on Audio, Speech, and Language Processing. (Sourav Ghosh and Vibhav Agarwal contributed equally to this work.)Journal-ref: IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 32, pp. 1173-1184, 2024Subjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: Data-to-text (D2T) generation is a crucial task in many natural language understanding (NLU) applications and forms the foundation of task-oriented dialog systems. In the context of conversational AI solutions that can work directly with local data on the user's device, architectures utilizing large pre-trained language models (PLMs) are impractical for on-device deployment due to a high memory footprint. To this end, we propose TrICy, a novel lightweight framework for an enhanced D2T task that generates text sequences based on the intent in context and may further be guided by user-provided triggers. We leverage an attention-copy mechanism to predict out-of-vocabulary (OOV) words accurately. Performance analyses on E2E NLG dataset (BLEU: 66.43%, ROUGE-L: 70.14%), WebNLG dataset (BLEU: Seen 64.08%, Unseen 52.35%), and our Custom dataset related to text messaging applications, showcase our architecture's effectiveness. Moreover, we show that by leveraging an optional trigger input, data-to-text generation quality increases significantly and achieves the new SOTA score of 69.29% BLEU for E2E NLG. Furthermore, our analyses show that TrICy achieves at least 24% and 3% improvement in BLEU and METEOR respectively over LLMs like GPT-3, ChatGPT, and Llama 2. We also demonstrate that in some scenarios, performance improvement due to triggers is observed even when they are absent in training.
- [617] arXiv:2402.01715 (cross-list from cs.CL) [ pdf , ps , html , other ]
-
Title: ChatGPT vs Gemini vs LLaMA on Multilingual Sentiment AnalysisSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI)
Abstract: Automated sentiment analysis using Large Language Model (LLM)-based models like ChatGPT, Gemini or LLaMA2 is becoming widespread, both in academic research and in industrial applications. However, assessment and validation of their performance in case of ambiguous or ironic text is still poor. In this study, we constructed nuanced and ambiguous scenarios, we translated them in 10 languages, and we predicted their associated sentiment using popular LLMs. The results are validated against post-hoc human responses. Ambiguous scenarios are often well-coped by ChatGPT and Gemini, but we recognise significant biases and inconsistent performance across models and evaluated human languages. This work provides a standardised methodology for automated sentiment analysis evaluation and makes a call for action to further improve the algorithms and their underlying data, to improve their performance, interpretability and applicability.
- [618] arXiv:2402.01717 (cross-list from cs.CL) [ pdf , ps , html , other ]
-
Title: From RAG to QA-RAG: Integrating Generative AI for Pharmaceutical Regulatory Compliance ProcessJaewoong Kim (Sungkyunkwan University), Moohong Min (Sungkyunkwan University)Comments: Total number of pages: 9. Total number of figures: 2. For the source code and experimental results of this paper, see this https URL . For the dataset used in training and evaluating the model, see this https URL Pharmaceuticals FAQSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI); Information Retrieval (cs.IR)
Abstract: Regulatory compliance in the pharmaceutical industry entails navigating through complex and voluminous guidelines, often requiring significant human resources. To address these challenges, our study introduces a chatbot model that utilizes generative AI and the Retrieval Augmented Generation (RAG) method. This chatbot is designed to search for guideline documents relevant to the user inquiries and provide answers based on the retrieved guidelines. Recognizing the inherent need for high reliability in this domain, we propose the Question and Answer Retrieval Augmented Generation (QA-RAG) model. In comparative experiments, the QA-RAG model demonstrated a significant improvement in accuracy, outperforming all other baselines including conventional RAG methods. This paper details QA-RAG's structure and performance evaluation, emphasizing its potential for the regulatory compliance domain in the pharmaceutical industry and beyond. We have made our work publicly available for further research and development.
- [619] arXiv:2402.01720 (cross-list from cs.CY) [ pdf , ps , other ]
-
Title: Deep Learning Based Amharic Chatbot for FAQs in UniversitiesJournal-ref: Machine Learning (cs.LG), V1, 2024Subjects: Computers and Society (cs.CY) ; Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG)
Abstract: University students often spend a considerable amount of time seeking answers to common questions from administrators or teachers. This can become tedious for both parties, leading to a need for a solution. In response, this paper proposes a chatbot model that utilizes natural language processing and deep learning techniques to answer frequently asked questions (FAQs) in the Amharic language. Chatbots are computer programs that simulate human conversation through the use of artificial intelligence (AI), acting as a virtual assistant to handle questions and other tasks. The proposed chatbot program employs tokenization, normalization, stop word removal, and stemming to analyze and categorize Amharic input sentences. Three machine learning model algorithms were used to classify tokens and retrieve appropriate responses: Support Vector Machine (SVM), Multinomial Naïve Bayes, and deep neural networks implemented through TensorFlow, Keras, and NLTK. The deep learning model achieved the best results with 91.55% accuracy and a validation loss of 0.3548 using an Adam optimizer and SoftMax activation function. The chatbot model was integrated with Facebook Messenger and deployed on a Heroku server for 24-hour accessibility. The experimental results demonstrate that the chatbot framework achieved its objectives and effectively addressed challenges such as Amharic Fidel variation, morphological variation, and lexical gaps. Future research could explore the integration of Amharic WordNet to narrow the lexical gap and support more complex questions.
- [620] arXiv:2402.01721 (cross-list from cs.CY) [ pdf , ps , html , other ]
-
Title: Non-Consensual Synthetic Intimate Imagery: Prevalence, Attitudes, and Knowledge in 10 CountriesSubjects: Computers and Society (cs.CY) ; Artificial Intelligence (cs.AI); Human-Computer Interaction (cs.HC)
Abstract: Deepfake technologies have become ubiquitous, "democratizing" the ability to manipulate photos and videos. One popular use of deepfake technology is the creation of sexually explicit content, which can then be posted and shared widely on the internet. Drawing on a survey of over 16,000 respondents in 10 different countries, this article examines attitudes and behaviors related to "deepfake pornography" as a specific form of non-consensual synthetic intimate imagery (NSII). Our study found that deepfake pornography behaviors were considered harmful by respondents, despite nascent societal awareness. Regarding the prevalence of deepfake porn victimization and perpetration, 2.2% of all respondents indicated personal victimization, and 1.8% all of respondents indicated perpetration behaviors. Respondents from countries with specific legislation still reported perpetration and victimization experiences, suggesting NSII laws are inadequate to deter perpetration. Approaches to prevent and reduce harms may include digital literacy education, as well as enforced platform policies, practices, and tools which better detect, prevent, and respond to NSII content.
- [621] arXiv:2402.01722 (cross-list from cs.CL) [ pdf , ps , html , other ]
-
Title: Enhancing Large Language Model Performance To Answer Questions and Extract Information More AccuratelyLiang Zhang , Katherine Jijo , Spurthi Setty , Eden Chung , Fatima Javid , Natan Vidra , Tommy CliffordSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI)
Abstract: Large Language Models (LLMs) generate responses to questions; however, their effectiveness is often hindered by sub-optimal quality of answers and occasional failures to provide accurate responses to questions. To address these challenges, a fine-tuning process is employed, involving feedback and examples to refine models. The objective is to enhance AI models through continuous feedback loops, utilizing metrics such as cosine similarity, LLM evaluation and Rouge-L scores to evaluate the models. Leveraging LLMs like GPT-3.5, GPT4ALL, and LLaMA2, and Claude, this approach is benchmarked on financial datasets, including the FinanceBench and RAG Instruct Benchmark Tester Dataset, illustrating the necessity of fine-tuning. The results showcase the capability of fine-tuned models to surpass the accuracy of zero-shot LLMs, providing superior question and answering capabilities. Notably, the combination of fine-tuning the LLM with a process known as Retrieval Augmented Generation (RAG) proves to generate responses with improved accuracy.
- [622] arXiv:2402.01723 (cross-list from cs.CL) [ pdf , ps , html , other ]
-
Title: An Empirical Study on Large Language Models in Accuracy and Robustness under Chinese Industrial ScenariosZongjie Li , Wenying Qiu , Pingchuan Ma , Yichen Li , You Li , Sijia He , Baozheng Jiang , Shuai Wang , Weixi GuSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI)
Abstract: Recent years have witnessed the rapid development of large language models (LLMs) in various domains. To better serve the large number of Chinese users, many commercial vendors in China have adopted localization strategies, training and providing local LLMs specifically customized for Chinese users. Furthermore, looking ahead, one of the key future applications of LLMs will be practical deployment in industrial production by enterprises and users in those sectors. However, the accuracy and robustness of LLMs in industrial scenarios have not been well studied. In this paper, we present a comprehensive empirical study on the accuracy and robustness of LLMs in the context of the Chinese industrial production area. We manually collected 1,200 domain-specific problems from 8 different industrial sectors to evaluate LLM accuracy. Furthermore, we designed a metamorphic testing framework containing four industrial-specific stability categories with eight abilities, totaling 13,631 questions with variants to evaluate LLM robustness. In total, we evaluated 9 different LLMs developed by Chinese vendors, as well as four different LLMs developed by global vendors. Our major findings include: (1) Current LLMs exhibit low accuracy in Chinese industrial contexts, with all LLMs scoring less than 0.6. (2) The robustness scores vary across industrial sectors, and local LLMs overall perform worse than global ones. (3) LLM robustness differs significantly across abilities. Global LLMs are more robust under logical-related variants, while advanced local LLMs perform better on problems related to understanding Chinese industrial terminology. Our study results provide valuable guidance for understanding and promoting the industrial domain capabilities of LLMs from both development and industrial enterprise perspectives. The results further motivate possible research directions and tooling support.
- [623] arXiv:2402.01724 (cross-list from cs.CL) [ pdf , ps , html , other ]
-
Title: CERM: Context-aware Literature-based Discovery via Sentiment AnalysisSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: Driven by the abundance of biomedical publications, we introduce a sentiment analysis task to understand food-health relationship. Prior attempts to incorporate health into recipe recommendation and analysis systems have primarily focused on ingredient nutritional components or utilized basic computational models trained on curated labeled data. Enhanced models that capture the inherent relationship between food ingredients and biomedical concepts can be more beneficial for food-related research, given the wealth of information in biomedical texts. Considering the costly data labeling process, these models should effectively utilize both labeled and unlabeled data. This paper introduces Entity Relationship Sentiment Analysis (ERSA), a new task that captures the sentiment of a text based on an entity pair. ERSA extends the widely studied Aspect Based Sentiment Analysis (ABSA) task. Specifically, our study concentrates on the ERSA task applied to biomedical texts, focusing on (entity-entity) pairs of biomedical and food concepts. ERSA poses a significant challenge compared to traditional sentiment analysis tasks, as sentence sentiment may not align with entity relationship sentiment. Additionally, we propose CERM, a semi-supervised architecture that combines different word embeddings to enhance the encoding of the ERSA task. Experimental results showcase the model's efficiency across diverse learning scenarios.
- [624] arXiv:2402.01725 (cross-list from cs.CL) [ pdf , ps , html , other ]
-
Title: Fortifying Ethical Boundaries in AI: Advanced Strategies for Enhancing Security in Large Language ModelsSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI)
Abstract: Recent advancements in large language models (LLMs) have significantly enhanced capabilities in natural language processing and artificial intelligence. These models, including GPT-3.5 and LLaMA-2, have revolutionized text generation, translation, and question-answering tasks due to the transformative Transformer model. Despite their widespread use, LLMs present challenges such as ethical dilemmas when models are compelled to respond inappropriately, susceptibility to phishing attacks, and privacy violations. This paper addresses these challenges by introducing a multi-pronged approach that includes: 1) filtering sensitive vocabulary from user input to prevent unethical responses; 2) detecting role-playing to halt interactions that could lead to 'prison break' scenarios; 3) implementing custom rule engines to restrict the generation of prohibited content; and 4) extending these methodologies to various LLM derivatives like Multi-Model Large Language Models (MLLMs). Our approach not only fortifies models against unethical manipulations and privacy breaches but also maintains their high performance across tasks. We demonstrate state-of-the-art performance under various attack prompts, without compromising the model's core functionalities. Furthermore, the introduction of differentiated security levels empowers users to control their personal data disclosure. Our methods contribute to reducing social risks and conflicts arising from technological abuse, enhance data protection, and promote social equity. Collectively, this research provides a framework for balancing the efficiency of question-answering systems with user privacy and ethical standards, ensuring a safer user experience and fostering trust in AI technology.
- [625] arXiv:2402.01726 (cross-list from cs.CL) [ pdf , ps , other ]
-
Title: AI Does Not Alter Perceptions of Text MessagesSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI); Human-Computer Interaction (cs.HC)
Abstract: For many people, anxiety, depression, and other social and mental factors can make composing text messages an active challenge. To remedy this problem, large language models (LLMs) may yet prove to be the perfect tool to assist users that would otherwise find texting difficult or stressful. However, despite rapid uptake in LLM usage, considerations for their assistive usage in text message composition have not been explored. A primary concern regarding LLM usage is that poor public sentiment regarding AI introduces the possibility that its usage may harm perceptions of AI-assisted text messages, making usage counter-productive. To (in)validate this possibility, we explore how the belief that a text message did or did not receive AI assistance in composition alters its perceived tone, clarity, and ability to convey intent. In this study, we survey the perceptions of 26 participants on 18 randomly labeled pre-composed text messages. In analyzing the participants' ratings of message tone, clarity, and ability to convey intent, we find that there is no statistically significant evidence that the belief that AI is utilized alters recipient perceptions. This provides hopeful evidence that LLM-based text message composition assistance can be implemented without the risk of counter-productive outcomes.
- [626] arXiv:2402.01727 (cross-list from cs.CY) [ pdf , ps , other ]
-
Title: Prompting Diverse Ideas: Increasing AI Idea VarianceSubjects: Computers and Society (cs.CY) ; Artificial Intelligence (cs.AI)
Abstract: Unlike routine tasks where consistency is prized, in creativity and innovation the goal is to create a diverse set of ideas. This paper delves into the burgeoning interest in employing Artificial Intelligence (AI) to enhance the productivity and quality of the idea generation process. While previous studies have found that the average quality of AI ideas is quite high, prior research also has pointed to the inability of AI-based brainstorming to create sufficient dispersion of ideas, which limits novelty and the quality of the overall best idea. Our research investigates methods to increase the dispersion in AI-generated ideas. Using GPT-4, we explore the effect of different prompting methods on Cosine Similarity, the number of unique ideas, and the speed with which the idea space gets exhausted. We do this in the domain of developing a new product development for college students, priced under $50. In this context, we find that (1) pools of ideas generated by GPT-4 with various plausible prompts are less diverse than ideas generated by groups of human subjects (2) the diversity of AI generated ideas can be substantially improved using prompt engineering (3) Chain-of-Thought (CoT) prompting leads to the highest diversity of ideas of all prompts we evaluated and was able to come close to what is achieved by groups of human subjects. It also was capable of generating the highest number of unique ideas of any prompt we studied.
- [627] arXiv:2402.01728 (cross-list from cs.CL) [ pdf , ps , html , other ]
-
Title: Hardware Phi-1.5B: A Large Language Model Encodes Hardware Domain Specific KnowledgeWeimin Fu , Shijie Li , Yifang Zhao , Haocheng Ma , Raj Dutta , Xuan Zhang , Kaichen Yang , Yier Jin , Xiaolong GuoComments: 6 pages, 6 figuresJournal-ref: 29th IEEE/ACM Asia and South Pacific Design Automation Conference (ASP-DAC); 2024 January; Incheon Songdo Convensia, South KoreaSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI); Hardware Architecture (cs.AR)
Abstract: In the rapidly evolving semiconductor industry, where research, design, verification, and manufacturing are intricately linked, the potential of Large Language Models to revolutionize hardware design and security verification is immense. The primary challenge, however, lies in the complexity of hardware specific issues that are not adequately addressed by the natural language or software code knowledge typically acquired during the pretraining stage. Additionally, the scarcity of datasets specific to the hardware domain poses a significant hurdle in developing a foundational model. Addressing these challenges, this paper introduces Hardware Phi 1.5B, an innovative large language model specifically tailored for the hardware domain of the semiconductor industry. We have developed a specialized, tiered dataset comprising small, medium, and large subsets and focused our efforts on pretraining using the medium dataset. This approach harnesses the compact yet efficient architecture of the Phi 1.5B model. The creation of this first pretrained, hardware domain specific large language model marks a significant advancement, offering improved performance in hardware design and verification tasks and illustrating a promising path forward for AI applications in the semiconductor sector.
- [628] arXiv:2402.01729 (cross-list from cs.CL) [ pdf , ps , html , other ]
-
Title: Contextualization Distillation from Large Language Model for Knowledge Graph CompletionComments: Accepted by EACL 2024 findings v3: add missing citationsSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI)
Abstract: While textual information significantly enhances the performance of pre-trained language models (PLMs) in knowledge graph completion (KGC), the static and noisy nature of existing corpora collected from Wikipedia articles or synsets definitions often limits the potential of PLM-based KGC models. To surmount these challenges, we introduce the Contextualization Distillation strategy, a versatile plug-in-and-play approach compatible with both discriminative and generative KGC frameworks. Our method begins by instructing large language models (LLMs) to transform compact, structural triplets into context-rich segments. Subsequently, we introduce two tailored auxiliary tasks, reconstruction and contextualization, allowing smaller KGC models to assimilate insights from these enriched triplets. Comprehensive evaluations across diverse datasets and KGC techniques highlight the efficacy and adaptability of our approach, revealing consistent performance enhancements irrespective of underlying pipelines or architectures. Moreover, our analysis makes our method more explainable and provides insight into generating path selection, as well as the choosing of suitable distillation tasks. All the code and data in this work will be released at this https URL
- [629] arXiv:2402.01730 (cross-list from cs.CL) [ pdf , ps , html , other ]
-
Title: Evaluating LLM -- Generated Multimodal Diagnosis from Medical Images and Symptom AnalysisComments: Department of Informatics, University of Piraeus, GreeceSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
Abstract: Large language models (LLMs) constitute a breakthrough state-of-the-art Artificial Intelligence technology which is rapidly evolving and promises to aid in medical diagnosis. However, the correctness and the accuracy of their returns has not yet been properly evaluated. In this work, we propose an LLM evaluation paradigm that incorporates two independent steps of a novel methodology, namely (1) multimodal LLM evaluation via structured interactions and (2) follow-up, domain-specific analysis based on data extracted via the previous interactions. Using this paradigm, (1) we evaluate the correctness and accuracy of LLM-generated medical diagnosis with publicly available multimodal multiple-choice questions(MCQs) in the domain of Pathology and (2) proceed to a systemic and comprehensive analysis of extracted results. We used GPT-4-Vision-Preview as the LLM to respond to complex, medical questions consisting of both images and text, and we explored a wide range of diseases, conditions, chemical compounds, and related entity types that are included in the vast knowledge domain of Pathology. GPT-4-Vision-Preview performed quite well, scoring approximately 84\% of correct diagnoses. Next, we further analyzed the findings of our work, following an analytical approach which included Image Metadata Analysis, Named Entity Recognition and Knowledge Graphs. Weaknesses of GPT-4-Vision-Preview were revealed on specific knowledge paths, leading to a further understanding of its shortcomings in specific areas. Our methodology and findings are not limited to the use of GPT-4-Vision-Preview, but a similar approach can be followed to evaluate the usefulness and accuracy of other LLMs and, thus, improve their use with further optimization.
- [630] arXiv:2402.01732 (cross-list from cs.CY) [ pdf , ps , html , other ]
-
Title: Identifying and Improving Disability Bias in GAI-Based Resume ScreeningSubjects: Computers and Society (cs.CY) ; Artificial Intelligence (cs.AI)
Abstract: As Generative AI rises in adoption, its use has expanded to include domains such as hiring and recruiting. However, without examining the potential of bias, this may negatively impact marginalized populations, including people with disabilities. To address this important concern, we present a resume audit study, in which we ask ChatGPT (specifically, GPT-4) to rank a resume against the same resume enhanced with an additional leadership award, scholarship, panel presentation, and membership that are disability related. We find that GPT-4 exhibits prejudice towards these enhanced CVs. Further, we show that this prejudice can be quantifiably reduced by training a custom GPTs on principles of DEI and disability justice. Our study also includes a unique qualitative analysis of the types of direct and indirect ableism GPT-4 uses to justify its biased decisions and suggest directions for additional bias mitigation work. Additionally, since these justifications are presumably drawn from training data containing real-world biased statements made by humans, our analysis suggests additional avenues for understanding and addressing human bias.
- [631] arXiv:2402.01733 (cross-list from cs.CL) [ pdf , ps , other ]
-
Title: Development and Testing of Retrieval Augmented Generation in Large Language Models -- A Case Study ReportYuHe Ke , Liyuan Jin , Kabilan Elangovan , Hairil Rizal Abdullah , Nan Liu , Alex Tiong Heng Sia , Chai Rick Soh , Joshua Yi Min Tung , Jasmine Chiat Ling Ong , Daniel Shu Wei TingComments: NASubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI)
Abstract: Purpose: Large Language Models (LLMs) hold significant promise for medical applications. Retrieval Augmented Generation (RAG) emerges as a promising approach for customizing domain knowledge in LLMs. This case study presents the development and evaluation of an LLM-RAG pipeline tailored for healthcare, focusing specifically on preoperative medicine.
Methods: We developed an LLM-RAG model using 35 preoperative guidelines and tested it against human-generated responses, with a total of 1260 responses evaluated. The RAG process involved converting clinical documents into text using Python-based frameworks like LangChain and Llamaindex, and processing these texts into chunks for embedding and retrieval. Vector storage techniques and selected embedding models to optimize data retrieval, using Pinecone for vector storage with a dimensionality of 1536 and cosine similarity for loss metrics. Human-generated answers, provided by junior doctors, were used as a comparison.
Results: The LLM-RAG model generated answers within an average of 15-20 seconds, significantly faster than the 10 minutes typically required by humans. Among the basic LLMs, GPT4.0 exhibited the best accuracy of 80.1%. This accuracy was further increased to 91.4% when the model was enhanced with RAG. Compared to the human-generated instructions, which had an accuracy of 86.3%, the performance of the GPT4.0 RAG model demonstrated non-inferiority (p=0.610).
Conclusions: In this case study, we demonstrated a LLM-RAG model for healthcare implementation. The pipeline shows the advantages of grounded knowledge, upgradability, and scalability as important aspects of healthcare LLM deployment. - [632] arXiv:2402.01735 (cross-list from cs.CL) [ pdf , ps , html , other ]
-
Title: VIALM: A Survey and Benchmark of Visually Impaired Assistance with Large ModelsComments: under reviewSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
Abstract: Visually Impaired Assistance (VIA) aims to automatically help the visually impaired (VI) handle daily activities. The advancement of VIA primarily depends on developments in Computer Vision (CV) and Natural Language Processing (NLP), both of which exhibit cutting-edge paradigms with large models (LMs). Furthermore, LMs have shown exceptional multimodal abilities to tackle challenging physically-grounded tasks such as embodied robots. To investigate the potential and limitations of state-of-the-art (SOTA) LMs' capabilities in VIA applications, we present an extensive study for the task of VIA with LMs (VIALM). In this task, given an image illustrating the physical environments and a linguistic request from a VI user, VIALM aims to output step-by-step guidance to assist the VI user in fulfilling the request grounded in the environment. The study consists of a survey reviewing recent LM research and benchmark experiments examining selected LMs' capabilities in VIA. The results indicate that while LMs can potentially benefit VIA, their output cannot be well environment-grounded (i.e., 25.7% GPT-4's responses) and lacks fine-grained guidance (i.e., 32.1% GPT-4's responses).
- [633] arXiv:2402.01736 (cross-list from cs.CL) [ pdf , ps , html , other ]
-
Title: SADAS: A Dialogue Assistant System Towards Remediating Norm Violations in Bilingual Socio-Cultural ConversationsYuncheng Hua , Zhuang Li , Linhao Luo , Kadek Ananta Satriadi , Tao Feng , Haolan Zhan , Lizhen Qu , Suraj Sharma , Ingrid Zukerman , Zhaleh Semnani-Azad , Gholamreza HaffariComments: 8 pages, 2 figuresSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI)
Abstract: In today's globalized world, bridging the cultural divide is more critical than ever for forging meaningful connections. The Socially-Aware Dialogue Assistant System (SADAS) is our answer to this global challenge, and it's designed to ensure that conversations between individuals from diverse cultural backgrounds unfold with respect and understanding. Our system's novel architecture includes: (1) identifying the categories of norms present in the dialogue, (2) detecting potential norm violations, (3) evaluating the severity of these violations, (4) implementing targeted remedies to rectify the breaches, and (5) articulates the rationale behind these corrective actions. We employ a series of State-Of-The-Art (SOTA) techniques to build different modules, and conduct numerous experiments to select the most suitable backbone model for each of the modules. We also design a human preference experiment to validate the overall performance of the system. We will open-source our system (including source code, tools and applications), hoping to advance future research. A demo video of our system can be found at:~\url{ this https URL }. We have released our code and software at:~\url{ this https URL }.
- [634] arXiv:2402.01737 (cross-list from cs.CL) [ pdf , ps , html , other ]
-
Title: Assistive Large Language Model Agents for Socially-Aware Negotiation DialoguesComments: 18 pages, 1 figure, 11 tables; Under review in IJCAI 2024Subjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI)
Abstract: In this work, we aim to develop LLM agents to mitigate social norm violations in negotiations in a multi-agent setting. We simulate real-world negotiations by letting two large Language Models (LLMs) play the roles of two negotiators in each conversation. A third LLM acts as a remediation agent to rewrite utterances violating norms for improving negotiation outcomes. As it is a novel task, no manually constructed data is available. To address this limitation, we introduce a value impact based In-Context Learning (ICL) method to identify high-quality ICL examples for the LLM-based remediation agents, where the value impact function measures the quality of negotiation outcomes. We show the connection of this method to policy learning and provide rich empirical evidence to demonstrate its effectiveness in negotiations across three different topics: product sale, housing price, and salary negotiation. The source code and the generated dataset will be publicly available upon acceptance.
- [635] arXiv:2402.01739 (cross-list from cs.CL) [ pdf , ps , html , other ]
-
Title: OpenMoE: An Early Effort on Open Mixture-of-Experts Language ModelsSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI); Distributed, Parallel, and Cluster Computing (cs.DC); Machine Learning (cs.LG)
Abstract: To help the open-source community have a better understanding of Mixture-of-Experts (MoE) based large language models (LLMs), we train and release OpenMoE, a series of fully open-sourced and reproducible decoder-only MoE LLMs, ranging from 650M to 34B parameters and trained on up to over 1T tokens. Our investigation confirms that MoE-based LLMs can offer a more favorable cost-effectiveness trade-off than dense LLMs, highlighting the potential effectiveness for future LLM development.
One more important contribution of this study is an in-depth analysis of the routing mechanisms within our OpenMoE models, leading to three significant findings: Context-Independent Specialization, Early Routing Learning, and Drop-towards-the-End. We discovered that routing decisions in MoE models are predominantly based on token IDs, with minimal context relevance. The token-to-expert assignments are determined early in the pre-training phase and remain largely unchanged. This imperfect routing can result in performance degradation, particularly in sequential tasks like multi-turn conversations, where tokens appearing later in a sequence are more likely to be dropped. Finally, we rethink our design based on the above-mentioned observations and analysis. To facilitate future MoE LLM development, we propose potential strategies for mitigating the issues we found and further improving off-the-shelf MoE LLM designs. - [636] arXiv:2402.01740 (cross-list from cs.CL) [ pdf , ps , html , other ]
-
Title: Compensatory Biases Under Cognitive Load: Reducing Selection Bias in Large Language ModelsComments: 27 pages, 23 figuresSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI)
Abstract: Large Language Models (LLMs) like gpt-3.5-turbo and claude-instant-1.2 have become instrumental in interpreting and executing semantic-based tasks. Unfortunately, these models' inherent biases, akin to human cognitive biases, adversely affect their performance. Particularly affected is object selection from lists; a fundamental operation in digital navigation and decision-making. This research critically examines these biases and quantifies the effects on a representative list selection task. To explore these biases, we conducted a series of controlled experiments, manipulating temperature, list length, object identity, object type, prompt complexity, and model. This enabled us to isolate and measure the influence of the biases on selection behavior. Our findings show that bias structure is strongly dependent on the model, with object type modulating the magnitude of the effect. With a strong primacy effect, causing the first objects in a list to be disproportionately represented in outputs. Furthermore the usage of guard rails, a prompt engineering method of ensuring a response structure, can increase bias and decrease instruction adherence when combined with a selection task. The bias is ablated when the guard rail step is separated from the list sampling step, lowering the complexity of each individual task. The implications of this research are two-fold, practically providing a guide for designing unbiased LLM applications and theoretically suggesting that LLMs experience a form of cognitive load compensated for by increasing bias.
- [637] arXiv:2402.01741 (cross-list from cs.CL) [ pdf , ps , other ]
-
Title: Development and Testing of a Novel Large Language Model-Based Clinical Decision Support Systems for Medication Safety in 12 Clinical SpecialtiesJasmine Chiat Ling Ong , Liyuan Jin , Kabilan Elangovan , Gilbert Yong San Lim , Daniel Yan Zheng Lim , Gerald Gui Ren Sng , Yuhe Ke , Joshua Yi Min Tung , Ryan Jian Zhong , Christopher Ming Yao Koh , Keane Zhi Hao Lee , Xiang Chen , Jack Kian Chng , Aung Than , Ken Junyang Goh , Daniel Shu Wei TingSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI)
Abstract: Importance: We introduce a novel Retrieval Augmented Generation (RAG)-Large Language Model (LLM) framework as a Clinical Decision Support Systems (CDSS) to support safe medication prescription.
Objective: To evaluate the efficacy of LLM-based CDSS in correctly identifying medication errors in different patient case vignettes from diverse medical and surgical sub-disciplines, against a human expert panel derived ground truth. We compared performance for under 2 different CDSS practical healthcare integration modalities: LLM-based CDSS alone (fully autonomous mode) vs junior pharmacist + LLM-based CDSS (co-pilot, assistive mode).
Design, Setting, and Participants: Utilizing a RAG model with state-of-the-art medically-related LLMs (GPT-4, Gemini Pro 1.0 and Med-PaLM 2), this study used 61 prescribing error scenarios embedded into 23 complex clinical vignettes across 12 different medical and surgical specialties. A multidisciplinary expert panel assessed these cases for Drug-Related Problems (DRPs) using the PCNE classification and graded severity / potential for harm using revised NCC MERP medication error index. We compared.
Results RAG-LLM performed better compared to LLM alone. When employed in a co-pilot mode, accuracy, recall, and F1 scores were optimized, indicating effectiveness in identifying moderate to severe DRPs. The accuracy of DRP detection with RAG-LLM improved in several categories but at the expense of lower precision.
Conclusions This study established that a RAG-LLM based CDSS significantly boosts the accuracy of medication error identification when used alongside junior pharmacists (co-pilot), with notable improvements in detecting severe DRPs. This study also illuminates the comparative performance of current state-of-the-art LLMs in RAG-based CDSS systems. - [638] arXiv:2402.01742 (cross-list from cs.CL) [ pdf , ps , html , other ]
-
Title: Towards Optimizing the Costs of LLM UsageComments: 8 pages + Appendix, Total 12 pagesSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: Generative AI and LLMs in particular are heavily used nowadays for various document processing tasks such as question answering and summarization. However, different LLMs come with different capabilities for different tasks as well as with different costs, tokenization, and latency. In fact, enterprises are already incurring huge costs of operating or using LLMs for their respective use cases.
In this work, we propose optimizing the usage costs of LLMs by estimating their output quality (without actually invoking the LLMs), and then solving an optimization routine for the LLM selection to either keep costs under a budget, or minimize the costs, in a quality and latency aware manner. We propose a model to predict the output quality of LLMs on document processing tasks like summarization, followed by an LP rounding algorithm to optimize the selection of LLMs. We study optimization problems trading off the quality and costs, both theoretically and empirically. We further propose a sentence simplification model for reducing the number of tokens in a controlled manner. Additionally, we propose several deterministic heuristics for reducing tokens in a quality aware manner, and study the related optimization problem of applying the heuristics optimizing the quality and cost trade-off. We perform extensive empirical validation of our methods on not only enterprise datasets but also on open-source datasets, annotated by us, and show that we perform much better compared to closest baselines. Our methods reduce costs by 40%- 90% while improving quality by 4%-7%. We will release the annotated open source datasets to the community for further research and exploration. - [639] arXiv:2402.01743 (cross-list from cs.CY) [ pdf , ps , other ]
-
Title: The Reasoning Under Uncertainty Trap: A Structural AI RiskComments: 51 pages (excluding references), 7 chapters, 9 figuresSubjects: Computers and Society (cs.CY) ; Artificial Intelligence (cs.AI)
Abstract: This report examines a novel risk associated with current (and projected) AI tools. Making effective decisions about future actions requires us to reason under uncertainty (RUU), and doing so is essential to many critical real world problems. Overfaced by this challenge, there is growing demand for AI tools like LLMs to assist decision-makers. Having evidenced this demand and the incentives behind it, we expose a growing risk: we 1) do not currently sufficiently understand LLM capabilities in this regard, and 2) have no guarantees of performance given fundamental computational explosiveness and deep uncertainty constraints on accuracy. This report provides an exposition of what makes RUU so challenging for both humans and machines, and relates these difficulties to prospective AI timelines and capabilities. Having established this current potential misuse risk, we go on to expose how this seemingly additive risk (more misuse additively contributed to potential harm) in fact has multiplicative properties. Specifically, we detail how this misuse risk connects to a wider network of underlying structural risks (e.g., shifting incentives, limited transparency, and feedback loops) to produce non-linear harms. We go on to provide a solutions roadmap that targets multiple leverage points in the structure of the problem. This includes recommendations for all involved actors (prospective users, developers, and policy-makers) and enfolds insights from areas including Decision-making Under Deep Uncertainty and complex systems theory. We argue this report serves not only to raise awareness (and subsequently mitigate/correct) of a current, novel AI risk, but also awareness of the underlying class of structural risks by illustrating how their interconnected nature poses twin-dangers of camouflaging their presence, whilst amplifying their potential effects.
- [640] arXiv:2402.01744 (cross-list from q-bio.QM) [ pdf , ps , html , other ]
-
Title: Unveiling Molecular Moieties through Hierarchical Graph ExplainabilitySubjects: Quantitative Methods (q-bio.QM) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Molecular Networks (q-bio.MN)
Abstract: Background: Graph Neural Networks (GNN) have emerged in very recent years as a powerful tool for supporting in silico Virtual Screening. In this work we present a GNN which uses Graph Convolutional architectures to achieve very accurate multi-target screening. We also devised a hierarchical Explainable Artificial Intelligence (XAI) technique to catch information directly at atom, ring, and whole molecule level by leveraging the message passing mechanism. In this way, we find the most relevant moieties involved in bioactivity prediction. Results: We report a state-of-the-art GNN classifier on twenty Cyclin-dependent Kinase targets in support of VS. Our classifier outperforms previous SOTA approaches proposed by the authors. Moreover, a CDK1-only high-sensitivity version of the GNN has been designed to use our explainer in order to avoid the inherent bias of multi-class models. The hierarchical explainer has been validated by an expert chemist on 19 approved drugs on CDK1. Our explainer provided information in accordance to the docking analysis for 17 out of the 19 test drugs. Conclusion: Our approach is a valid support for shortening both the screening and the hit-to-lead phase. Detailed knowledge about the molecular substructures that play a role in the inhibitory action, can help the computational chemist to gain insights into the pharmacophoric function of the molecule also for repurposing purposes. Scientific Contribution Statement: The core scientific innovation of our work is the use of a hierarchical XAI approach on a GNN trained for a ligand-based VS task. The application of the hierarchical explainer allows for eliciting also structural information...
- [641] arXiv:2402.01746 (cross-list from cs.CY) [ pdf , ps , html , other ]
-
Title: 3DG: A Framework for Using Generative AI for Handling Sparse Learner Performance Data From Intelligent Tutoring SystemsJournal-ref: LAK 2024: International Workshop on Generative AI for Learning Analytics (GenAI-LA)Subjects: Computers and Society (cs.CY) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: Learning performance data (e.g., quiz scores and attempts) is significant for understanding learner engagement and knowledge mastery level. However, the learning performance data collected from Intelligent Tutoring Systems (ITSs) often suffers from sparsity, impacting the accuracy of learner modeling and knowledge assessments. To address this, we introduce the 3DG framework (3-Dimensional tensor for Densification and Generation), a novel approach combining tensor factorization with advanced generative models, including Generative Adversarial Network (GAN) and Generative Pre-trained Transformer (GPT), for enhanced data imputation and augmentation. The framework operates by first representing the data as a three-dimensional tensor, capturing dimensions of learners, questions, and attempts. It then densifies the data through tensor factorization and augments it using Generative AI models, tailored to individual learning patterns identified via clustering. Applied to data from an AutoTutor lesson by the Center for the Study of Adult Literacy (CSAL), the 3DG framework effectively generated scalable, personalized simulations of learning performance. Comparative analysis revealed GAN's superior reliability over GPT-4 in this context, underscoring its potential in addressing data sparsity challenges in ITSs and contributing to the advancement of personalized educational technology.
- [642] arXiv:2402.01748 (cross-list from cs.NI) [ pdf , ps , other ]
-
Title: Large Multi-Modal Models (LMMs) as Universal Foundation Models for AI-Native Wireless SystemsShengzhe Xu , Christo Kurisummoottil Thomas , Omar Hashash , Nikhil Muralidhar , Walid Saad , Naren RamakrishnanSubjects: Networking and Internet Architecture (cs.NI) ; Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG)
Abstract: Large language models (LLMs) and foundation models have been recently touted as a game-changer for 6G systems. However, recent efforts on LLMs for wireless networks are limited to a direct application of existing language models that were designed for natural language processing (NLP) applications. To address this challenge and create wireless-centric foundation models, this paper presents a comprehensive vision on how to design universal foundation models that are tailored towards the deployment of artificial intelligence (AI)-native networks. Diverging from NLP-based foundation models, the proposed framework promotes the design of large multi-modal models (LMMs) fostered by three key capabilities: 1) processing of multi-modal sensing data, 2) grounding of physical symbol representations in real-world wireless systems using causal reasoning and retrieval-augmented generation (RAG), and 3) enabling instructibility from the wireless environment feedback to facilitate dynamic network adaptation thanks to logical and mathematical reasoning facilitated by neuro-symbolic AI. In essence, these properties enable the proposed LMM framework to build universal capabilities that cater to various cross-layer networking tasks and alignment of intents across different domains. Preliminary results from experimental evaluation demonstrate the efficacy of grounding using RAG in LMMs, and showcase the alignment of LMMs with wireless system designs. Furthermore, the enhanced rationale exhibited in the responses to mathematical questions by LMMs, compared to vanilla LLMs, demonstrates the logical and mathematical reasoning capabilities inherent in LMMs. Building on those results, we present a sequel of open questions and challenges for LMMs. We then conclude with a set of recommendations that ignite the path towards LMM-empowered AI-native systems.
- [643] arXiv:2402.01749 (cross-list from cs.CY) [ pdf , ps , other ]
-
Title: Towards Urban General Intelligence: A Review and Outlook of Urban Foundation ModelsSubjects: Computers and Society (cs.CY) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: Machine learning techniques are now integral to the advancement of intelligent urban services, playing a crucial role in elevating the efficiency, sustainability, and livability of urban environments. The recent emergence of foundation models such as ChatGPT marks a revolutionary shift in the fields of machine learning and artificial intelligence. Their unparalleled capabilities in contextual understanding, problem solving, and adaptability across a wide range of tasks suggest that integrating these models into urban domains could have a transformative impact on the development of smart cities. Despite growing interest in Urban Foundation Models~(UFMs), this burgeoning field faces challenges such as a lack of clear definitions, systematic reviews, and universalizable solutions. To this end, this paper first introduces the concept of UFM and discusses the unique challenges involved in building them. We then propose a data-centric taxonomy that categorizes current UFM-related works, based on urban data modalities and types. Furthermore, to foster advancement in this field, we present a promising framework aimed at the prospective realization of UFMs, designed to overcome the identified challenges. Additionally, we explore the application landscape of UFMs, detailing their potential impact in various urban contexts. Relevant papers and open-source resources have been collated and are continuously updated at this https URL .
- [644] arXiv:2402.01750 (cross-list from cs.CL) [ pdf , ps , html , other ]
-
Title: PACE: A Pragmatic Agent for Enhancing Communication Efficiency Using Large Language ModelsComments: 11 pages,11 figures, submitted to IJCAI 2024Subjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI)
Abstract: Current communication technologies face limitations in terms of theoretical capacity, spectrum availability, and power resources. Pragmatic communication, leveraging terminal intelligence for selective data transmission, offers resource conservation. Existing research lacks universal intention resolution tools, limiting applicability to specific tasks. This paper proposes an image pragmatic communication framework based on a Pragmatic Agent for Communication Efficiency (PACE) using Large Language Models (LLM). In this framework, PACE sequentially performs semantic perception, intention resolution, and intention-oriented coding. To ensure the effective utilization of LLM in communication, a knowledge base is designed to supplement the necessary knowledge, dedicated prompts are introduced to facilitate understanding of pragmatic communication scenarios and task requirements, and a chain of thought is designed to assist in making reasonable trade-offs between transmission efficiency and cost. For experimental validation, this paper constructs an image pragmatic communication dataset along with corresponding evaluation standards. Simulation results indicate that the proposed method outperforms traditional and non-LLM-based pragmatic communication in terms of transmission efficiency.
- [645] arXiv:2402.01751 (cross-list from cs.CL) [ pdf , ps , other ]
-
Title: Performance Assessment of ChatGPT vs Bard in Detecting Alzheimer's DementiaComments: 22 pagesSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI)
Abstract: Large language models (LLMs) find increasing applications in many fields. Here, three LLM chatbots (ChatGPT-3.5, ChatGPT-4 and Bard) are assessed - in their current form, as publicly available - for their ability to recognize Alzheimer's Dementia (AD) and Cognitively Normal (CN) individuals using textual input derived from spontaneous speech recordings. Zero-shot learning approach is used at two levels of independent queries, with the second query (chain-of-thought prompting) eliciting more detailed than the first. Each LLM chatbot's performance is evaluated on the prediction generated in terms of accuracy, sensitivity, specificity, precision and F1 score. LLM chatbots generated three-class outcome ("AD", "CN", or "Unsure"). When positively identifying AD, Bard produced highest true-positives (89% recall) and highest F1 score (71%), but tended to misidentify CN as AD, with high confidence (low "Unsure" rates); for positively identifying CN, GPT-4 resulted in the highest true-negatives at 56% and highest F1 score (62%), adopting a diplomatic stance (moderate "Unsure" rates). Overall, three LLM chatbots identify AD vs CN surpassing chance-levels but do not currently satisfy clinical application.
- [646] arXiv:2402.01752 (cross-list from eess.AS) [ pdf , ps , other ]
-
Title: Identifying False Content and Hate Speech in Sinhala YouTube Videos by Analyzing the AudioW. A. K. M. Wickramaarachchi , Sameeri Sathsara Subasinghe , K. K. Rashani Tharushika Wijerathna , A. Sahashra Udani Athukorala , Lakmini Abeywardhana , A. KarunasenaSubjects: Audio and Speech Processing (eess.AS) ; Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD)
Abstract: YouTube faces a global crisis with the dissemination of false information and hate speech. To counter these issues, YouTube has implemented strict rules against uploading content that includes false information or promotes hate speech. While numerous studies have been conducted to reduce offensive English-language content, there's a significant lack of research on Sinhala content. This study aims to address the aforementioned gap by proposing a solution to minimize the spread of violence and misinformation in Sinhala YouTube videos. The approach involves developing a rating system that assesses whether a video contains false information by comparing the title and description with the audio content and evaluating whether the video includes hate speech. The methodology encompasses several steps, including audio extraction using the Pytube library, audio transcription via the fine-tuned Whisper model, hate speech detection employing the distilroberta-base model and a text classification LSTM model, and text summarization through the fine-tuned BART-Large- XSUM model. Notably, the Whisper model achieved a 48.99\% word error rate, while the distilroberta-base model demonstrated an F1 score of 0.856 and a recall value of 0.861 in comparison to the LSTM model, which exhibited signs of overfitting.
- [647] arXiv:2402.01758 (cross-list from cs.CY) [ pdf , ps , html , other ]
-
Title: Aalap: AI Assistant for Legal & Paralegal Functions in IndiaAman Tiwari , Prathamesh Kalamkar , Atreyo Banerjee , Saurabh Karn , Varun Hemachandran , Smita GuptaSubjects: Computers and Society (cs.CY) ; Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
Abstract: Using proprietary Large Language Models on legal tasks poses challenges due to data privacy issues, domain data heterogeneity, domain knowledge sophistication, and domain objectives uniqueness. We created Aalalp, a fine-tuned Mistral 7B model on instructions data related to specific Indian legal tasks. The performance of Aalap is better than gpt-3.5-turbo in 31\% of our test data and obtains an equivalent score in 34\% of the test data as evaluated by GPT4. Training Aalap mainly focuses on teaching legal reasoning rather than legal recall. Aalap is definitely helpful for the day-to-day activities of lawyers, judges, or anyone working in legal systems.
- [648] arXiv:2402.01759 (cross-list from cs.CL) [ pdf , ps , html , other ]
-
Title: Systematic Literature Review: Computational Approaches for Humour Style ClassificationSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: Understanding various humour styles is essential for comprehending the multifaceted nature of humour and its impact on fields such as psychology and artificial intelligence. This understanding has revealed that humour, depending on the style employed, can either have therapeutic or detrimental effects on an individual's health and relationships. Although studies dedicated exclusively to computational-based humour style analysis remain somewhat rare, an expansive body of research thrives within related task, particularly binary humour and sarcasm recognition. In this systematic literature review (SLR), we survey the landscape of computational techniques applied to these related tasks and also uncover their fundamental relevance to humour style analysis. Through this study, we unveil common approaches, illuminate various datasets and evaluation metrics, and effectively navigate the complex terrain of humour research. Our efforts determine potential research gaps and outlined promising directions. Furthermore, the SLR identifies a range of features and computational models that can seamlessly transition from related tasks like binary humour and sarcasm detection to invigorate humour style classification. These features encompass incongruity, sentiment and polarity analysis, ambiguity detection, acoustic nuances, visual cues, contextual insights, and more. The computational models that emerge contain traditional machine learning paradigms, neural network architectures, transformer-based models, and specialised models attuned to the nuances of humour. Finally, the SLR provides access to existing datasets related to humour and sarcasm, facilitating the work of future researchers.
- [649] arXiv:2402.01760 (cross-list from cs.CY) [ pdf , ps , html , other ]
-
Title: Trust and ethical considerations in a multi-modal, explainable AI-driven chatbot tutoring system: The case of collaboratively solving Rubik's CubeKausik Lakkaraju , Vedant Khandelwal , Biplav Srivastava , Forest Agostinelli , Hengtao Tang , Prathamjeet Singh , Dezhi Wu , Matt Irvin , Ashish KunduComments: Accepted at 'Neural Conversational AI Workshop - What's left to TEACH (Trustworthy, Enhanced, Adaptable, Capable, and Human-centric) chatbots?' at ICML 2023Subjects: Computers and Society (cs.CY) ; Artificial Intelligence (cs.AI)
Abstract: Artificial intelligence (AI) has the potential to transform education with its power of uncovering insights from massive data about student learning patterns. However, ethical and trustworthy concerns of AI have been raised but are unsolved. Prominent ethical issues in high school AI education include data privacy, information leakage, abusive language, and fairness. This paper describes technological components that were built to address ethical and trustworthy concerns in a multi-modal collaborative platform (called ALLURE chatbot) for high school students to collaborate with AI to solve the Rubik's cube. In data privacy, we want to ensure that the informed consent of children, parents, and teachers, is at the center of any data that is managed. Since children are involved, language, whether textual, audio, or visual, is acceptable both from users and AI and the system can steer interaction away from dangerous situations. In information management, we also want to ensure that the system, while learning to improve over time, does not leak information about users from one group to another.
- [650] arXiv:2402.01761 (cross-list from cs.CL) [ pdf , ps , html , other ]
-
Title: Rethinking Interpretability in the Era of Large Language ModelsComments: 7 pagesSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: Interpretable machine learning has exploded as an area of interest over the last decade, sparked by the rise of increasingly large datasets and deep neural networks. Simultaneously, large language models (LLMs) have demonstrated remarkable capabilities across a wide array of tasks, offering a chance to rethink opportunities in interpretable machine learning. Notably, the capability to explain in natural language allows LLMs to expand the scale and complexity of patterns that can be given to a human. However, these new capabilities raise new challenges, such as hallucinated explanations and immense computational costs.
In this position paper, we start by reviewing existing methods to evaluate the emerging field of LLM interpretation (both interpreting LLMs and using LLMs for explanation). We contend that, despite their limitations, LLMs hold the opportunity to redefine interpretability with a more ambitious scope across many applications, including in auditing LLMs themselves. We highlight two emerging research priorities for LLM interpretation: using LLMs to directly analyze new datasets and to generate interactive explanations. - [651] arXiv:2402.01762 (cross-list from cs.CY) [ pdf , ps , other ]
-
Title: Commercial AI, Conflict, and Moral Responsibility: A theoretical analysis and practical approach to the moral responsibilities associated with dual-use AI technologyComments: 9 pagesSubjects: Computers and Society (cs.CY) ; Artificial Intelligence (cs.AI)
Abstract: This paper presents a theoretical analysis and practical approach to the moral responsibilities when developing AI systems for non-military applications that may nonetheless be used for conflict applications. We argue that AI represents a form of crossover technology that is different from previous historical examples of dual- or multi-use technology as it has a multiplicative effect across other technologies. As a result, existing analyses of ethical responsibilities around dual-use technologies do not necessarily work for AI systems. We instead argue that stakeholders involved in the AI system lifecycle are morally responsible for uses of their systems that are reasonably foreseeable. The core idea is that an agent's moral responsibility for some action is not necessarily determined by their intentions alone; we must also consider what the agent could reasonably have foreseen to be potential outcomes of their action, such as the potential use of a system in conflict even when it is not designed for that. In particular, we contend that it is reasonably foreseeable that: (1) civilian AI systems will be applied to active conflict, including conflict support activities, (2) the use of civilian AI systems in conflict will impact applications of the law of armed conflict, and (3) crossover AI technology will be applied to conflicts that fall short of armed conflict. Given these reasonably foreseeably outcomes, we present three technically feasible actions that developers of civilian AIs can take to potentially mitigate their moral responsibility: (a) establishing systematic approaches to multi-perspective capability testing, (b) integrating digital watermarking in model weight matrices, and (c) utilizing monitoring and reporting mechanisms for conflict-related AI applications.
- [652] arXiv:2402.01763 (cross-list from cs.DB) [ pdf , ps , html , other ]
-
Title: When Large Language Models Meet Vector Databases: A SurveySubjects: Databases (cs.DB) ; Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG)
Abstract: This survey explores the synergistic potential of Large Language Models (LLMs) and Vector Databases (VecDBs), a burgeoning but rapidly evolving research area. With the proliferation of LLMs comes a host of challenges, including hallucinations, outdated knowledge, prohibitive commercial application costs, and memory issues. VecDBs emerge as a compelling solution to these issues by offering an efficient means to store, retrieve, and manage the high-dimensional vector representations intrinsic to LLM operations. Through this nuanced review, we delineate the foundational principles of LLMs and VecDBs and critically analyze their integration's impact on enhancing LLM functionalities. This discourse extends into a discussion on the speculative future developments in this domain, aiming to catalyze further research into optimizing the confluence of LLMs and VecDBs for advanced data handling and knowledge extraction capabilities.
- [653] arXiv:2402.01765 (cross-list from cs.CL) [ pdf , ps , html , other ]
-
Title: LLMs Simulate Big Five Personality Traits: Further EvidenceSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI)
Abstract: An empirical investigation into the simulation of the Big Five personality traits by large language models (LLMs), namely Llama2, GPT4, and Mixtral, is presented. We analyze the personality traits simulated by these models and their stability. This contributes to the broader understanding of the capabilities of LLMs to simulate personality traits and the respective implications for personalized human-computer interaction.
- [654] arXiv:2402.01766 (cross-list from cs.CL) [ pdf , ps , html , other ]
-
Title: LLM Voting: Human Choices and AI Collective Decision MakingComments: Submitted to ICML2024Subjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI); Computers and Society (cs.CY); Machine Learning (cs.LG); General Economics (econ.GN)
Abstract: This paper investigates the voting behaviors of Large Language Models (LLMs), particularly OpenAI's GPT4 and LLaMA2, and their alignment with human voting patterns. Our approach included a human voting experiment to establish a baseline for human preferences and a parallel experiment with LLM agents. The study focused on both collective outcomes and individual preferences, revealing differences in decision-making and inherent biases between humans and LLMs. We observed a trade-off between preference diversity and alignment in LLMs, with a tendency towards more uniform choices as compared to the diverse preferences of human voters. This finding indicates that LLMs could lead to more homogenized collective outcomes when used in voting assistance, underscoring the need for cautious integration of LLMs into democratic processes.
- [655] arXiv:2402.01767 (cross-list from cs.CL) [ pdf , ps , html , other ]
-
Title: HiQA: A Hierarchical Contextual Augmentation RAG for Massive Documents QASubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: As language model agents leveraging external tools rapidly evolve, significant progress has been made in question-answering(QA) methodologies utilizing supplementary documents and the Retrieval-Augmented Generation (RAG) approach. This advancement has improved the response quality of language models and alleviates the appearance of hallucination. However, these methods exhibit limited retrieval accuracy when faced with massive indistinguishable documents, presenting notable challenges in their practical application. In response to these emerging challenges, we present HiQA, an advanced framework for multi-document question-answering (MDQA) that integrates cascading metadata into content as well as a multi-route retrieval mechanism. We also release a benchmark called MasQA to evaluate and research in MDQA. Finally, HiQA demonstrates the state-of-the-art performance in multi-document environments.
- [656] arXiv:2402.01769 (cross-list from cs.CL) [ pdf , ps , html , other ]
-
Title: Redefining "Hallucination" in LLMs: Towards a psychology-informed framework for mitigating misinformationSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI)
Abstract: In recent years, large language models (LLMs) have become incredibly popular, with ChatGPT for example being used by over a billion users. While these models exhibit remarkable language understanding and logical prowess, a notable challenge surfaces in the form of "hallucinations." This phenomenon results in LLMs outputting misinformation in a confident manner, which can lead to devastating consequences with such a large user base. However, we question the appropriateness of the term "hallucination" in LLMs, proposing a psychological taxonomy based on cognitive biases and other psychological phenomena. Our approach offers a more fine-grained understanding of this phenomenon, allowing for targeted solutions. By leveraging insights from how humans internally resolve similar challenges, we aim to develop strategies to mitigate LLM hallucinations. This interdisciplinary approach seeks to move beyond conventional terminology, providing a nuanced understanding and actionable pathways for improvement in LLM reliability.
- [657] arXiv:2402.01770 (cross-list from cs.CY) [ pdf , ps , html , other ]
-
Title: Extending Interactive Science Exhibits into the Classroom using Anthropomorphized Chatbots and Bloom's TaxonomySubjects: Computers and Society (cs.CY) ; Artificial Intelligence (cs.AI)
Abstract: This study explores the use of Generative AI chatbots for transforming public science exhibits into virtual experiences that can extend the engagement of exhibits into the classroom. The broader goal is to increase accessibility of science exhibits, especially for those marginalized in STEM due to various factors, including cultural barriers. We hypothesize that turning exhibits into first-person anthropomorphized chatbots with a personality, like quirky-talking asteroids or comets, can increase engagement and learning. The paper mainly explores if such techniques are possible using Generative AI (e.g. GPT) via prompt engineering alone. The research includes an investigation into the possibility of integrating interactive assessment via question-generation using Bloom's Taxonomy. Initial results indicate that it is possible to combine these techniques. As such, it lays a foundation for future classroom evaluations of such chatbots to gauge their overall efficacy in extending the reach of science exhibitions. The paper concludes by discussing extensions of the research to fully evaluate effectiveness in virtual field-trips. We also include a brief examination of additional ways to enhance student motivation towards learning via chatbots.
- [658] arXiv:2402.01771 (cross-list from cs.CL) [ pdf , ps , html , other ]
-
Title: BlackMamba: Mixture of Experts for State-Space ModelsSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI); Distributed, Parallel, and Cluster Computing (cs.DC); Machine Learning (cs.LG)
Abstract: State-space models (SSMs) have recently demonstrated competitive performance to transformers at large-scale language modeling benchmarks while achieving linear time and memory complexity as a function of sequence length. Mamba, a recently released SSM model, shows impressive performance in both language modeling and long sequence processing tasks. Simultaneously, mixture-of-expert (MoE) models have shown remarkable performance while significantly reducing the compute and latency costs of inference at the expense of a larger memory footprint. In this paper, we present BlackMamba, a novel architecture that combines the Mamba SSM with MoE to obtain the benefits of both. We demonstrate that BlackMamba performs competitively against both Mamba and transformer baselines, and outperforms in inference and training FLOPs. We fully train and open-source 340M/1.5B and 630M/2.8B BlackMamba models on 300B tokens of a custom dataset. We show that BlackMamba inherits and combines both of the benefits of SSM and MoE architectures, combining linear-complexity generation from SSM with cheap and fast inference from MoE. We release all weights, checkpoints, and inference code open-source. Inference code at: this https URL
- [659] arXiv:2402.01772 (cross-list from cs.CL) [ pdf , ps , html , other ]
-
Title: Disentangling the Roles of Target-Side Transfer and Regularization in Multilingual Machine TranslationSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: Multilingual Machine Translation (MMT) benefits from knowledge transfer across different language pairs. However, improvements in one-to-many translation compared to many-to-one translation are only marginal and sometimes even negligible. This performance discrepancy raises the question of to what extent positive transfer plays a role on the target-side for one-to-many MT. In this paper, we conduct a large-scale study that varies the auxiliary target side languages along two dimensions, i.e., linguistic similarity and corpus size, to show the dynamic impact of knowledge transfer on the main language pairs. We show that linguistically similar auxiliary target languages exhibit strong ability to transfer positive knowledge. With an increasing size of similar target languages, the positive transfer is further enhanced to benefit the main language pairs. Meanwhile, we find distant auxiliary target languages can also unexpectedly benefit main language pairs, even with minimal positive transfer ability. Apart from transfer, we show distant auxiliary target languages can act as a regularizer to benefit translation performance by enhancing the generalization and model inference calibration.
- [660] arXiv:2402.01777 (cross-list from cs.CL) [ pdf , ps , other ]
-
Title: On the Psychology of GPT-4: Moderately anxious, slightly masculine, honest, and humbleComments: 16 pages, 8 tables, 1 code repositorySubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI); Human-Computer Interaction (cs.HC)
Abstract: We subject GPT-4 to a number of rigorous psychometric tests and analyze the results. We find that, compared to the average human, GPT-4 tends to show more honesty and humility, and less machiavellianism and narcissism. It sometimes exhibits ambivalent sexism, leans slightly toward masculinity, is moderately anxious but mostly not depressive (but not always). It shows human-average numerical literacy and has cognitive reflection abilities that are above human average for verbal tasks.
- [661] arXiv:2402.01781 (cross-list from cs.CL) [ pdf , ps , html , other ]
-
Title: When Benchmarks are Targets: Revealing the Sensitivity of Large Language Model LeaderboardsNorah Alzahrani , Hisham Abdullah Alyahya , Yazeed Alnumay , Sultan Alrashed , Shaykhah Alsubaie , Yusef Almushaykeh , Faisal Mirza , Nouf Alotaibi , Nora Altwairesh , Areeb Alowisheq , M Saiful Bari , Haidar KhanSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: Large Language Model (LLM) leaderboards based on benchmark rankings are regularly used to guide practitioners in model selection. Often, the published leaderboard rankings are taken at face value - we show this is a (potentially costly) mistake. Under existing leaderboards, the relative performance of LLMs is highly sensitive to (often minute) details. We show that for popular multiple choice question benchmarks (e.g. MMLU) minor perturbations to the benchmark, such as changing the order of choices or the method of answer selection, result in changes in rankings up to 8 positions. We explain this phenomenon by conducting systematic experiments over three broad categories of benchmark perturbations and identifying the sources of this behavior. Our analysis results in several best-practice recommendations, including the advantage of a hybrid scoring method for answer selection. Our study highlights the dangers of relying on simple benchmark evaluations and charts the path for more robust evaluation schemes on the existing benchmarks.
- [662] arXiv:2402.01782 (cross-list from cs.NE) [ pdf , ps , html , other ]
-
Title: Benchmarking Spiking Neural Network Learning Methods with Varying LocalitySubjects: Neural and Evolutionary Computing (cs.NE) ; Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Abstract: Spiking Neural Networks (SNNs), providing more realistic neuronal dynamics, have shown to achieve performance comparable to Artificial Neural Networks (ANNs) in several machine learning tasks. Information is processed as spikes within SNNs in an event-based mechanism that significantly reduces energy consumption. However, training SNNs is challenging due to the non-differentiable nature of the spiking mechanism. Traditional approaches, such as Backpropagation Through Time (BPTT), have shown effectiveness but comes with additional computational and memory costs and are biologically implausible. In contrast, recent works propose alternative learning methods with varying degrees of locality, demonstrating success in classification tasks. In this work, we show that these methods share similarities during the training process, while they present a trade-off between biological plausibility and performance. Further, this research examines the implicitly recurrent nature of SNNs and investigates the influence of addition of explicit recurrence to SNNs. We experimentally prove that the addition of explicit recurrent weights enhances the robustness of SNNs. We also investigate the performance of local learning methods under gradient and non-gradient based adversarial attacks.
- [663] arXiv:2402.01783 (cross-list from cs.CL) [ pdf , ps , html , other ]
-
Title: Hierarchical Multi-Label Classification of Online Vaccine ConcernsComments: Published in AAAI 2024 Health Intelligence workshopSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: Vaccine concerns are an ever-evolving target, and can shift quickly as seen during the COVID-19 pandemic. Identifying longitudinal trends in vaccine concerns and misinformation might inform the healthcare space by helping public health efforts strategically allocate resources or information campaigns. We explore the task of detecting vaccine concerns in online discourse using large language models (LLMs) in a zero-shot setting without the need for expensive training datasets. Since real-time monitoring of online sources requires large-scale inference, we explore cost-accuracy trade-offs of different prompting strategies and offer concrete takeaways that may inform choices in system designs for current applications. An analysis of different prompting strategies reveals that classifying the concerns over multiple passes through the LLM, each consisting a boolean question whether the text mentions a vaccine concern or not, works the best. Our results indicate that GPT-4 can strongly outperform crowdworker accuracy when compared to ground truth annotations provided by experts on the recently introduced VaxConcerns dataset, achieving an overall F1 score of 78.7%.
- [664] arXiv:2402.01785 (cross-list from cs.LG) [ pdf , ps , html , other ]
-
Title: DoubleMLDeep: Estimation of Causal Effects with Multimodal DataSven Klaassen , Jan Teichert-Kluge , Philipp Bach , Victor Chernozhukov , Martin Spindler , Suhas VijaykumarSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Econometrics (econ.EM); Methodology (stat.ME); Machine Learning (stat.ML)
Abstract: This paper explores the use of unstructured, multimodal data, namely text and images, in causal inference and treatment effect estimation. We propose a neural network architecture that is adapted to the double machine learning (DML) framework, specifically the partially linear model. An additional contribution of our paper is a new method to generate a semi-synthetic dataset which can be used to evaluate the performance of causal effect estimation in the presence of text and images as confounders. The proposed methods and architectures are evaluated on the semi-synthetic dataset and compared to standard approaches, highlighting the potential benefit of using text and images directly in causal studies. Our findings have implications for researchers and practitioners in economics, marketing, finance, medicine and data science in general who are interested in estimating causal quantities using non-traditional data.
- [665] arXiv:2402.01787 (cross-list from cs.CY) [ pdf , ps , html , other ]
-
Title: Harm Amplification in Text-to-Image ModelsSusan Hao , Renee Shelby , Yuchi Liu , Hansa Srinivasan , Mukul Bhutani , Burcu Karagol Ayan , Shivani Poddar , Sarah LaszloSubjects: Computers and Society (cs.CY) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: Text-to-image (T2I) models have emerged as a significant advancement in generative AI; however, there exist safety concerns regarding their potential to produce harmful image outputs even when users input seemingly safe prompts. This phenomenon, where T2I models generate harmful representations that were not explicit in the input, poses a potentially greater risk than adversarial prompts, leaving users unintentionally exposed to harms. Our paper addresses this issue by first introducing a formal definition for this phenomenon, termed harm amplification. We further contribute to the field by developing methodologies to quantify harm amplification in which we consider the harm of the model output in the context of user input. We then empirically examine how to apply these different methodologies to simulate real-world deployment scenarios including a quantification of disparate impacts across genders resulting from harm amplification. Together, our work aims to offer researchers tools to comprehensively address safety challenges in T2I systems and contribute to the responsible deployment of generative AI models.
- [666] arXiv:2402.01788 (cross-list from cs.CL) [ pdf , ps , html , other ]
-
Title: LitLLM: A Toolkit for Scientific Literature ReviewSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI); Information Retrieval (cs.IR)
Abstract: Conducting literature reviews for scientific papers is essential for understanding research, its limitations, and building on existing work. It is a tedious task which makes an automatic literature review generator appealing. Unfortunately, many existing works that generate such reviews using Large Language Models (LLMs) have significant limitations. They tend to hallucinate-generate non-actual information-and ignore the latest research they have not been trained on. To address these limitations, we propose a toolkit that operates on Retrieval Augmented Generation (RAG) principles, specialized prompting and instructing techniques with the help of LLMs. Our system first initiates a web search to retrieve relevant papers by summarizing user-provided abstracts into keywords using an off-the-shelf LLM. Authors can enhance the search by supplementing it with relevant papers or keywords, contributing to a tailored retrieval process. Second, the system re-ranks the retrieved papers based on the user-provided abstract. Finally, the related work section is generated based on the re-ranked results and the abstract. There is a substantial reduction in time and effort for literature review compared to traditional methods, establishing our toolkit as an efficient alternative. Our open-source toolkit is accessible at this https URL and Huggingface space ( this https URL ) with the video demo at this https URL .
- [667] arXiv:2402.01789 (cross-list from cs.CY) [ pdf , ps , other ]
-
Title: The Political Preferences of LLMsSubjects: Computers and Society (cs.CY) ; Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
Abstract: We report here a comprehensive analysis about the political preferences embedded in Large Language Models (LLMs). Namely, we administer 11 political orientation tests, designed to identify the political preferences of the test taker, to 24 state-of-the-art conversational LLMs, both close and open source. The results indicate that when probed with questions/statements with political connotations most conversational LLMs tend to generate responses that are diagnosed by most political test instruments as manifesting preferences for left-of-center viewpoints. We note that this is not the case for base (i.e. foundation) models upon which LLMs optimized for conversation with humans are built. However, base models' suboptimal performance at coherently answering questions suggests caution when interpreting their classification by political orientation tests. Though not conclusive, our results provide preliminary evidence for the intriguing hypothesis that the embedding of political preferences into LLMs might be happening mostly post-pretraining. Namely, during the supervised fine-tuning (SFT) and/or Reinforcement Learning (RL) stages of the conversational LLMs training pipeline. We provide further support for this hypothesis by showing that LLMs are easily steerable into target locations of the political spectrum via SFT requiring only modest compute and custom data, illustrating the ability of SFT to imprint political preferences onto LLMs. As LLMs have started to displace more traditional information sources such as search engines or Wikipedia, the implications of political biases embedded in LLMs has important societal ramifications.
- [668] arXiv:2402.01790 (cross-list from cs.LG) [ pdf , ps , html , other ]
-
Title: An introduction to graphical tensor notation for mechanistic interpretabilityComments: 30 pages, 75 figuresSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Abstract: Graphical tensor notation is a simple way of denoting linear operations on tensors, originating from physics. Modern deep learning consists almost entirely of operations on or between tensors, so easily understanding tensor operations is quite important for understanding these systems. This is especially true when attempting to reverse-engineer the algorithms learned by a neural network in order to understand its behavior: a field known as mechanistic interpretability. It's often easy to get confused about which operations are happening between tensors and lose sight of the overall structure, but graphical tensor notation makes it easier to parse things at a glance and see interesting equivalences. The first half of this document introduces the notation and applies it to some decompositions (SVD, CP, Tucker, and tensor network decompositions), while the second half applies it to some existing some foundational approaches for mechanistically understanding language models, loosely following ``A Mathematical Framework for Transformer Circuits'', then constructing an example ``induction head'' circuit in graphical tensor notation.
- [669] arXiv:2402.01791 (cross-list from quant-ph) [ pdf , ps , html , other ]
-
Title: Variational Quantum Circuits Enhanced Generative Adversarial NetworkSubjects: Quantum Physics (quant-ph) ; Artificial Intelligence (cs.AI); Emerging Technologies (cs.ET); Machine Learning (cs.LG)
Abstract: Generative adversarial network (GAN) is one of the widely-adopted machine-learning frameworks for a wide range of applications such as generating high-quality images, video, and audio contents. However, training a GAN could become computationally expensive for large neural networks. In this work, we propose a hybrid quantum-classical architecture for improving GAN (denoted as QC-GAN). The performance was examed numerically by benchmarking with a classical GAN using MindSpore Quantum on the task of hand-written image generation. The generator of the QC-GAN consists of a quantum variational circuit together with a one-layer neural network, and the discriminator consists of a traditional neural network. Leveraging the entangling and expressive power of quantum circuits, our hybrid architecture achieved better performance (Frechet Inception Distance) than the classical GAN, with much fewer training parameters and number of iterations for convergence. We have also demonstrated the superiority of QC-GAN over an alternative quantum GAN, namely pathGAN, which could hardly generate 16$\times$16 or larger images. This work demonstrates the value of combining ideas from quantum computing with machine learning for both areas of Quantum-for-AI and AI-for-Quantum.
- [670] arXiv:2402.01799 (cross-list from cs.LG) [ pdf , ps , html , other ]
-
Title: Faster and Lighter LLMs: A Survey on Current Challenges and Way ForwardComments: Accepted at IJCAI '24 (Survey Track), Updated TGI resultsSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
Abstract: Despite the impressive performance of LLMs, their widespread adoption faces challenges due to substantial computational and memory requirements during inference. Recent advancements in model compression and system-level optimization methods aim to enhance LLM inference. This survey offers an overview of these methods, emphasizing recent developments. Through experiments on LLaMA(/2)-7B, we evaluate various compression techniques, providing practical insights for efficient LLM deployment in a unified setting. The empirical analysis on LLaMA(/2)-7B highlights the effectiveness of these methods. Drawing from survey insights, we identify current limitations and discuss potential future directions to improve LLM inference efficiency. We release the codebase to reproduce the results presented in this paper at this https URL
- [671] arXiv:2402.01801 (cross-list from cs.LG) [ pdf , ps , html , other ]
-
Title: Large Language Models for Time Series: A SurveyComments: GitHub repository: this https URLSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
Abstract: Large Language Models (LLMs) have seen significant use in domains such as natural language processing and computer vision. Going beyond text, image and graphics, LLMs present a significant potential for analysis of time series data, benefiting domains such as climate, IoT, healthcare, traffic, audio and finance. This survey paper provides an in-depth exploration and a detailed taxonomy of the various methodologies employed to harness the power of LLMs for time series analysis. We address the inherent challenge of bridging the gap between LLMs' original text data training and the numerical nature of time series data, and explore strategies for transferring and distilling knowledge from LLMs to numerical time series analysis. We detail various methodologies, including (1) direct prompting of LLMs, (2) time series quantization, (3) aligning techniques, (4) utilization of the vision modality as a bridging mechanism, and (5) the combination of LLMs with tools. Additionally, this survey offers a comprehensive overview of the existing multimodal time series and text datasets and delves into the challenges and future opportunities of this emerging field. We maintain an up-to-date Github repository which includes all the papers and datasets discussed in the survey.
- [672] arXiv:2402.01802 (cross-list from cs.LG) [ pdf , ps , html , other ]
-
Title: An Auction-based Marketplace for Model Trading in Federated LearningSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Computer Science and Game Theory (cs.GT)
Abstract: Federated learning (FL) is increasingly recognized for its efficacy in training models using locally distributed data. However, the proper valuation of shared data in this collaborative process remains insufficiently addressed. In this work, we frame FL as a marketplace of models, where clients act as both buyers and sellers, engaging in model trading. This FL market allows clients to gain monetary reward by selling their own models and improve local model performance through the purchase of others' models. We propose an auction-based solution to ensure proper pricing based on performance gain. Incentive mechanisms are designed to encourage clients to truthfully reveal their model valuations. Furthermore, we introduce a reinforcement learning (RL) framework for marketing operations, aiming to achieve maximum trading volumes under the dynamic and evolving market status. Experimental results on four datasets demonstrate that the proposed FL market can achieve high trading revenue and fair downstream task accuracy.
- [673] arXiv:2402.01804 (cross-list from cs.CY) [ pdf , ps , html , other ]
-
Title: Analysis of Internet of Things implementation barriers in the cold supply chain: an integrated ISM-MICMAC and DEMATEL approachSubjects: Computers and Society (cs.CY) ; Artificial Intelligence (cs.AI)
Abstract: Integrating Internet of Things (IoT) technology inside the cold supply chain can enhance transparency, efficiency, and quality, optimizing operating procedures and increasing productivity. The integration of IoT in this complicated setting is hindered by specific barriers that need a thorough examination. Prominent barriers to IoT implementation in the cold supply chain are identified using a two-stage model. After reviewing the available literature on the topic of IoT implementation, a total of 13 barriers were found. The survey data was cross-validated for quality, and Cronbach's alpha test was employed to ensure validity. This research applies the interpretative structural modeling technique in the first phase to identify the main barriers. Among those barriers, "regularity compliance" and "cold chain networks" are key drivers for IoT adoption strategies. MICMAC's driving and dependence power element categorization helps evaluate the barrier interactions. In the second phase of this research, a decision-making trial and evaluation laboratory methodology was employed to identify causal relationships between barriers and evaluate them according to their relative importance. Each cause is a potential drive, and if its efficiency can be enhanced, the system as a whole benefits. The research findings provide industry stakeholders, governments, and organizations with significant drivers of IoT adoption to overcome these barriers and optimize the utilization of IoT technology to improve the effectiveness and reliability of the cold supply chain.
- [674] arXiv:2402.01805 (cross-list from cs.CL) [ pdf , ps , html , other ]
-
Title: Can LLMs perform structured graph reasoning?Subjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI)
Abstract: Pretrained Large Language Models (LLMs) have demonstrated various reasoning capabilities through language-based prompts alone, particularly in unstructured task settings (tasks purely based on language semantics). However, LLMs often struggle with structured tasks, because of the inherent incompatibility of input representation. Reducing structured tasks to uni-dimensional language semantics often renders the problem trivial. Keeping the trade-off between LLM compatibility and structure complexity in mind, we design various graph reasoning tasks as a proxy to semi-structured tasks in this paper, in order to test the ability to navigate through representations beyond plain text in various LLMs. Particularly, we design 10 distinct problems of graph traversal, each representing increasing levels of complexity, and benchmark 5 different instruct-finetuned LLMs (GPT-4, GPT-3.5, Claude-2, Llama-2 and Palm-2) on the aforementioned tasks. Further, we analyse the performance of models across various settings such as varying sizes of graphs as well as different forms of k-shot prompting. We highlight various limitations, biases and properties of LLMs through this benchmarking process, such as an inverse relation to the average degrees of freedom of traversal per node in graphs, the overall negative impact of k-shot prompting on graph reasoning tasks, and a positive response bias which prevents LLMs from identifying the absence of a valid solution. Finally, we introduce a new prompting technique specially designed for graph traversal tasks (PathCompare), which demonstrates a notable increase in the performance of LLMs in comparison to standard prompting techniques such as Chain-of-Thought (CoT).
- [675] arXiv:2402.01806 (cross-list from cs.CL) [ pdf , ps , html , other ]
-
Title: HQA-Attack: Toward High Quality Black-Box Hard-Label Adversarial Attack on TextHan Liu , Zhi Xu , Xiaotong Zhang , Feng Zhang , Fenglong Ma , Hongyang Chen , Hong Yu , Xianchao ZhangSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI)
Abstract: Black-box hard-label adversarial attack on text is a practical and challenging task, as the text data space is inherently discrete and non-differentiable, and only the predicted label is accessible. Research on this problem is still in the embryonic stage and only a few methods are available. Nevertheless, existing methods rely on the complex heuristic algorithm or unreliable gradient estimation strategy, which probably fall into the local optimum and inevitably consume numerous queries, thus are difficult to craft satisfactory adversarial examples with high semantic similarity and low perturbation rate in a limited query budget. To alleviate above issues, we propose a simple yet effective framework to generate high quality textual adversarial examples under the black-box hard-label attack scenarios, named HQA-Attack. Specifically, after initializing an adversarial example randomly, HQA-attack first constantly substitutes original words back as many as possible, thus shrinking the perturbation rate. Then it leverages the synonym set of the remaining changed words to further optimize the adversarial example with the direction which can improve the semantic similarity and satisfy the adversarial condition simultaneously. In addition, during the optimizing procedure, it searches a transition synonym word for each changed word, thus avoiding traversing the whole synonym set and reducing the query number to some extent. Extensive experimental results on five text classification datasets, three natural language inference datasets and two real-world APIs have shown that the proposed HQA-Attack method outperforms other strong baselines significantly.
- [676] arXiv:2402.01812 (cross-list from cs.CL) [ pdf , ps , html , other ]
-
Title: Distilling LLMs' Decomposition Abilities into Compact Language ModelsComments: this https URLSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: Large Language Models (LLMs) have demonstrated proficiency in their reasoning abilities, yet their large size presents scalability challenges and limits any further customization. In contrast, compact models offer customized training but often fall short in solving complex reasoning tasks. This study focuses on distilling the LLMs' decomposition skills into compact models using offline reinforcement learning. We leverage the advancements in the LLM`s capabilities to provide feedback and generate a specialized task-specific dataset for training compact models. The development of an AI-generated dataset and the establishment of baselines constitute the primary contributions of our work, underscoring the potential of compact models in replicating complex problem-solving skills.
- [677] arXiv:2402.01821 (cross-list from cs.LG) [ pdf , ps , html , other ]
-
Title: Ecologically rational meta-learned inference explains human category learningComments: 22 pages (8 pages of main text, 4 pages of references, and 10 pages of appendix), 7 figures, and 4 TablesSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Abstract: Ecological rationality refers to the notion that humans are rational agents adapted to their environment. However, testing this theory remains challenging due to two reasons: the difficulty in defining what tasks are ecologically valid and building rational models for these tasks. In this work, we demonstrate that large language models can generate cognitive tasks, specifically category learning tasks, that match the statistics of real-world tasks, thereby addressing the first challenge. We tackle the second challenge by deriving rational agents adapted to these tasks using the framework of meta-learning, leading to a class of models called ecologically rational meta-learned inference (ERMI). ERMI quantitatively explains human data better than seven other cognitive models in two different experiments. It additionally matches human behavior on a qualitative level: (1) it finds the same tasks difficult that humans find difficult, (2) it becomes more reliant on an exemplar-based strategy for assigning categories with learning, and (3) it generalizes to unseen stimuli in a human-like way. Furthermore, we show that ERMI's ecologically valid priors allow it to achieve state-of-the-art performance on the OpenML-CC18 classification benchmark.
- [678] arXiv:2402.01822 (cross-list from cs.CL) [ pdf , ps , html , other ]
-
Title: Building Guardrails for Large Language ModelsYi Dong , Ronghui Mu , Gaojie Jin , Yi Qi , Jinwei Hu , Xingyu Zhao , Jie Meng , Wenjie Ruan , Xiaowei HuangComments: Under ReviewSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI)
Abstract: As Large Language Models (LLMs) become more integrated into our daily lives, it is crucial to identify and mitigate their risks, especially when the risks can have profound impacts on human users and societies. Guardrails, which filter the inputs or outputs of LLMs, have emerged as a core safeguarding technology. This position paper takes a deep look at current open-source solutions (Llama Guard, Nvidia NeMo, Guardrails AI), and discusses the challenges and the road towards building more complete solutions. Drawing on robust evidence from previous research, we advocate for a systematic approach to construct guardrails for LLMs, based on comprehensive consideration of diverse contexts across various LLMs applications. We propose employing socio-technical methods through collaboration with a multi-disciplinary team to pinpoint precise technical requirements, exploring advanced neural-symbolic implementations to embrace the complexity of the requirements, and developing verification and testing to ensure the utmost quality of the final product.
- [679] arXiv:2402.01825 (cross-list from cs.CL) [ pdf , ps , html , other ]
-
Title: Fractal Patterns May Unravel the Intelligence in Next-Token PredictionComments: 15 pages, 10 tables, 6 figuresSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI)
Abstract: We study the fractal structure of language, aiming to provide a precise formalism for quantifying properties that may have been previously suspected but not formally shown. We establish that language is: (1) self-similar, exhibiting complexities at all levels of granularity, with no particular characteristic context length, and (2) long-range dependent (LRD), with a Hurst parameter of approximately H=0.70. Based on these findings, we argue that short-term patterns/dependencies in language, such as in paragraphs, mirror the patterns/dependencies over larger scopes, like entire documents. This may shed some light on how next-token prediction can lead to a comprehension of the structure of text at multiple levels of granularity, from words and clauses to broader contexts and intents. We also demonstrate that fractal parameters improve upon perplexity-based bits-per-byte (BPB) in predicting downstream performance. We hope these findings offer a fresh perspective on language and the mechanisms underlying the success of LLMs.
- [680] arXiv:2402.01826 (cross-list from cs.CL) [ pdf , ps , html , other ]
-
Title: Leveraging Large Language Models for Analyzing Blood Pressure Variations Across Biological Sex from Scientific LiteratureSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI)
Abstract: Hypertension, defined as blood pressure (BP) that is above normal, holds paramount significance in the realm of public health, as it serves as a critical precursor to various cardiovascular diseases (CVDs) and significantly contributes to elevated mortality rates worldwide. However, many existing BP measurement technologies and standards might be biased because they do not consider clinical outcomes, comorbidities, or demographic factors, making them inconclusive for diagnostic purposes. There is limited data-driven research focused on studying the variance in BP measurements across these variables. In this work, we employed GPT-35-turbo, a large language model (LLM), to automatically extract the mean and standard deviation values of BP for both males and females from a dataset comprising 25 million abstracts sourced from PubMed. 993 article abstracts met our predefined inclusion criteria (i.e., presence of references to blood pressure, units of blood pressure such as mmHg, and mention of biological sex). Based on the automatically-extracted information from these articles, we conducted an analysis of the variations of BP values across biological sex. Our results showed the viability of utilizing LLMs to study the BP variations across different demographic factors.
- [681] arXiv:2402.01828 (cross-list from cs.CL) [ pdf , ps , html , other ]
-
Title: Retrieval Augmented End-to-End Spoken Dialog ModelsJournal-ref: Proc. ICASSP 2024Subjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI); Sound (cs.SD); Audio and Speech Processing (eess.AS)
Abstract: We recently developed SLM, a joint speech and language model, which fuses a pretrained foundational speech model and a large language model (LLM), while preserving the in-context learning capability intrinsic to the pretrained LLM. In this paper, we apply SLM to speech dialog applications where the dialog states are inferred directly from the audio signal.
Task-oriented dialogs often contain domain-specific entities, i.e., restaurants, hotels, train stations, and city names, which are difficult to recognize, however, critical for the downstream applications. Inspired by the RAG (retrieval-augmented generation) paradigm, we propose a retrieval augmented SLM (ReSLM) that overcomes this weakness. We first train a speech retriever to retrieve text entities mentioned in the audio. The retrieved entities are then added as text inputs to the underlying SLM to bias model predictions. We evaluated ReSLM on speech MultiWoz task (DSTC-11 challenge), and found that this retrieval augmentation boosts model performance, achieving joint goal accuracy (38.6% vs 32.7%), slot error rate (20.6% vs 24.8%) and ASR word error rate (5.5% vs 6.7%). While demonstrated on dialog state tracking, our approach is broadly applicable to other speech tasks requiring contextual information or domain-specific entities, such as contextual ASR with biasing capability. - [682] arXiv:2402.01830 (cross-list from cs.CL) [ pdf , ps , html , other ]
-
Title: PiCO: Peer Review in LLMs based on the Consistency OptimizationSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: Existing large language models (LLMs) evaluation methods typically focus on testing the performance on some closed-environment and domain-specific benchmarks with human annotations. In this paper, we explore a novel unsupervised evaluation direction, utilizing peer-review mechanisms to measure LLMs automatically. In this setting, both open-source and closed-source LLMs lie in the same environment, capable of answering unlabeled questions and evaluating each other, where each LLM's response score is jointly determined by other anonymous ones. To obtain the ability hierarchy among these models, we assign each LLM a learnable capability parameter to adjust the final ranking. We formalize it as a constrained optimization problem, intending to maximize the consistency of each LLM's capabilities and scores. The key assumption behind is that high-level LLM can evaluate others' answers more accurately than low-level ones, while higher-level LLM can also achieve higher response scores. Moreover, we propose three metrics called PEN, CIN, and LIS to evaluate the gap in aligning human rankings. We perform experiments on multiple datasets with these metrics, validating the effectiveness of the proposed approach.
- [683] arXiv:2402.01832 (cross-list from cs.CV) [ pdf , ps , html , other ]
-
Title: SynthCLIP: Are We Ready for a Fully Synthetic CLIP Training?Comments: Under reviewSubjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: We present SynthCLIP, a novel framework for training CLIP models with entirely synthetic text-image pairs, significantly departing from previous methods relying on real data. Leveraging recent text-to-image (TTI) generative networks and large language models (LLM), we are able to generate synthetic datasets of images and corresponding captions at any scale, with no human intervention. With training at scale, SynthCLIP achieves performance comparable to CLIP models trained on real datasets. We also introduce SynthCI-30M, a purely synthetic dataset comprising 30 million captioned images. Our code, trained models, and generated data are released at this https URL
- [684] arXiv:2402.01841 (cross-list from cs.SE) [ pdf , ps , html , other ]
-
Title: COMET: Generating Commit Messages using Delta Graph Context RepresentationComments: 22 Pages, 7 FiguresSubjects: Software Engineering (cs.SE) ; Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
Abstract: Commit messages explain code changes in a commit and facilitate collaboration among developers. Several commit message generation approaches have been proposed; however, they exhibit limited success in capturing the context of code changes. We propose Comet (Context-Aware Commit Message Generation), a novel approach that captures context of code changes using a graph-based representation and leverages a transformer-based model to generate high-quality commit messages. Our proposed method utilizes delta graph that we developed to effectively represent code differences. We also introduce a customizable quality assurance module to identify optimal messages, mitigating subjectivity in commit messages. Experiments show that Comet outperforms state-of-the-art techniques in terms of bleu-norm and meteor metrics while being comparable in terms of rogue-l. Additionally, we compare the proposed approach with the popular gpt-3.5-turbo model, along with gpt-4-turbo; the most capable GPT model, over zero-shot, one-shot, and multi-shot settings. We found Comet outperforming the GPT models, on five and four metrics respectively and provide competitive results with the two other metrics. The study has implications for researchers, tool developers, and software developers. Software developers may utilize Comet to generate context-aware commit messages. Researchers and tool developers can apply the proposed delta graph technique in similar contexts, like code review summarization.
- [685] arXiv:2402.01849 (cross-list from cs.LG) [ pdf , ps , html , other ]
-
Title: Capturing waste collection planning expert knowledge in a fitness function through preference learningJournal-ref: Engineering Applications of Artificial Intelligence 2021 Volume 99 104113Subjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Abstract: This paper copes with the COGERSA waste collection process. Up to now, experts have been manually designed the process using a trial and error mechanism. This process is not globally optimized, since it has been progressively and locally built as council demands appear. Planning optimization algorithms usually solve it, but they need a fitness function to evaluate a route planning quality. The drawback is that even experts are not able to propose one in a straightforward way due to the complexity of the process. Hence, the goal of this paper is to build a fitness function though a preference framework, taking advantage of the available expert knowledge and expertise. Several key performance indicators together with preference judgments are carefully established according to the experts for learning a promising fitness function. Particularly, the additivity property of them makes the task be much more affordable, since it allows to work with routes rather than with route plannings. Besides, a feature selection analysis is performed over such indicators, since the experts suspect of a potential existing (but unknown) redundancy among them. The experiment results confirm this hypothesis, since the best $C-$index ($98\%$ against around $94\%$) is reached when 6 or 8 out of 21 indicators are taken. Particularly, truck load seems to be a highly promising key performance indicator, together to the travelled distance along non-main roads. A comparison with other existing approaches shows that the proposed method clearly outperforms them, since the $C-$index goes from $72\%$ or $90\%$ to $98\%$.
- [686] arXiv:2402.01858 (cross-list from cs.LG) [ pdf , ps , html , other ]
-
Title: Explaining latent representations of generative models with large multimodal modelsComments: ICLR 2024 Workshop on Reliable and Responsible Foundation ModelsSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV)
Abstract: Learning interpretable representations of data generative latent factors is an important topic for the development of artificial intelligence. With the rise of the large multimodal model, it can align images with text to generate answers. In this work, we propose a framework to comprehensively explain each latent variable in the generative models using a large multimodal model. We further measure the uncertainty of our generated explanations, quantitatively evaluate the performance of explanation generation among multiple large multimodal models, and qualitatively visualize the variations of each latent variable to learn the disentanglement effects of different generative models on explanations. Finally, we discuss the explanatory capabilities and limitations of state-of-the-art large multimodal models.
- [687] arXiv:2402.01862 (cross-list from cs.LG) [ pdf , ps , html , other ]
-
Title: Parametric Feature Transfer: One-shot Federated Learning with Foundation ModelsComments: 20 pages, 12 figuresSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Abstract: In one-shot federated learning (FL), clients collaboratively train a global model in a single round of communication. Existing approaches for one-shot FL enhance communication efficiency at the expense of diminished accuracy. This paper introduces FedPFT (Federated Learning with Parametric Feature Transfer), a methodology that harnesses the transferability of foundation models to enhance both accuracy and communication efficiency in one-shot FL. The approach involves transferring per-client parametric models (specifically, Gaussian mixtures) of features extracted from foundation models. Subsequently, each parametric model is employed to generate synthetic features for training a classifier head. Experimental results on eight datasets demonstrate that FedPFT enhances the communication-accuracy frontier in both centralized and decentralized FL scenarios, as well as across diverse data-heterogeneity settings such as covariate shift and task shift, with improvements of up to 20.6%. Additionally, FedPFT adheres to the data minimization principle of FL, as clients do not send real features. We demonstrate that sending real features is vulnerable to potent reconstruction attacks. Moreover, we show that FedPFT is amenable to formal privacy guarantees via differential privacy, demonstrating favourable privacy-accuracy tradeoffs.
- [688] arXiv:2402.01863 (cross-list from cs.LG) [ pdf , ps , html , other ]
-
Title: DFML: Decentralized Federated Mutual LearningYasser H. Khalil , Amir H. Estiri , Mahdi Beitollahi , Nader Asadi , Sobhan Hemati , Xu Li , Guojun Zhang , Xi ChenSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Distributed, Parallel, and Cluster Computing (cs.DC)
Abstract: In the realm of real-world devices, centralized servers in Federated Learning (FL) present challenges including communication bottlenecks and susceptibility to a single point of failure. Additionally, contemporary devices inherently exhibit model and data heterogeneity. Existing work lacks a Decentralized FL (DFL) framework capable of accommodating such heterogeneity without imposing architectural restrictions or assuming the availability of public data. To address these issues, we propose a Decentralized Federated Mutual Learning (DFML) framework that is serverless, supports nonrestrictive heterogeneous models, and avoids reliance on public data. DFML effectively handles model and data heterogeneity through mutual learning, which distills knowledge between clients, and cyclically varying the amount of supervision and distillation signals. Extensive experimental results demonstrate consistent effectiveness of DFML in both convergence speed and global accuracy, outperforming prevalent baselines under various conditions. For example, with the CIFAR-100 dataset and 50 clients, DFML achieves a substantial increase of +17.20% and +19.95% in global accuracy under Independent and Identically Distributed (IID) and non-IID data shifts, respectively.
- [689] arXiv:2402.01864 (cross-list from cs.CY) [ pdf , ps , html , other ]
-
Title: (A)I Am Not a Lawyer, But...: Engaging Legal Experts towards Responsible LLM Policies for Legal AdviceComments: 14 pagesSubjects: Computers and Society (cs.CY) ; Artificial Intelligence (cs.AI)
Abstract: Large language models (LLMs) are increasingly capable of providing users with advice in a wide range of professional domains, including legal advice. However, relying on LLMs for legal queries raises concerns due to the significant expertise required and the potential real-world consequences of the advice. To explore \textit{when} and \textit{why} LLMs should or should not provide advice to users, we conducted workshops with 20 legal experts using methods inspired by case-based reasoning. The provided realistic queries ("cases") allowed experts to examine granular, situation-specific concerns and overarching technical and legal constraints, producing a concrete set of contextual considerations for LLM developers. By synthesizing the factors that impacted LLM response appropriateness, we present a 4-dimension framework: (1) User attributes and behaviors, (2) Nature of queries, (3) AI capabilities, and (4) Social impacts. We share experts' recommendations for LLM response strategies, which center around helping users identify `right questions to ask' and relevant information rather than providing definitive legal judgments. Our findings reveal novel legal considerations, such as unauthorized practice of law, confidentiality, and liability for inaccurate advice, that have been overlooked in the literature. The case-based deliberation method enabled us to elicit fine-grained, practice-informed insights that surpass those from de-contextualized surveys or speculative principles. These findings underscore the applicability of our method for translating domain-specific professional knowledge and practices into policies that can guide LLM behavior in a more responsible direction.
- [690] arXiv:2402.01874 (cross-list from cs.CL) [ pdf , ps , html , other ]
-
Title: The RL/LLM Taxonomy Tree: Reviewing Synergies Between Reinforcement Learning and Large Language ModelsMoschoula Pternea , Prerna Singh , Abir Chakraborty , Yagna Oruganti , Mirco Milletari , Sayli Bapat , Kebei JiangComments: 30 pages (including bibliography), 1 figure, 7 tablesSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Robotics (cs.RO)
Abstract: In this work, we review research studies that combine Reinforcement Learning (RL) and Large Language Models (LLMs), two areas that owe their momentum to the development of deep neural networks. We propose a novel taxonomy of three main classes based on the way that the two model types interact with each other. The first class, RL4LLM, includes studies where RL is leveraged to improve the performance of LLMs on tasks related to Natural Language Processing. L4LLM is divided into two sub-categories depending on whether RL is used to directly fine-tune an existing LLM or to improve the prompt of the LLM. In the second class, LLM4RL, an LLM assists the training of an RL model that performs a task that is not inherently related to natural language. We further break down LLM4RL based on the component of the RL training framework that the LLM assists or replaces, namely reward shaping, goal generation, and policy function. Finally, in the third class, RL+LLM, an LLM and an RL agent are embedded in a common planning framework without either of them contributing to training or fine-tuning of the other. We further branch this class to distinguish between studies with and without natural language feedback. We use this taxonomy to explore the motivations behind the synergy of LLMs and RL and explain the reasons for its success, while pinpointing potential shortcomings and areas where further research is needed, as well as alternative methodologies that serve the same goal.
- [691] arXiv:2402.01877 (cross-list from cs.HC) [ pdf , ps , html , other ]
-
Title: Mobile Fitting Room: On-device Virtual Try-on via Diffusion ModelsJustin Blalock , David Munechika , Harsha Karanth , Alec Helbling , Pratham Mehta , Seongmin Lee , Duen Horng ChauComments: 7 pages, 3 figuresSubjects: Human-Computer Interaction (cs.HC) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: The growing digital landscape of fashion e-commerce calls for interactive and user-friendly interfaces for virtually trying on clothes. Traditional try-on methods grapple with challenges in adapting to diverse backgrounds, poses, and subjects. While newer methods, utilizing the recent advances of diffusion models, have achieved higher-quality image generation, the human-centered dimensions of mobile interface delivery and privacy concerns remain largely unexplored. We present Mobile Fitting Room, the first on-device diffusion-based virtual try-on system. To address multiple inter-related technical challenges such as high-quality garment placement and model compression for mobile devices, we present a novel technical pipeline and an interface design that enables privacy preservation and user customization. A usage scenario highlights how our tool can provide a seamless, interactive virtual try-on experience for customers and provide a valuable service for fashion e-commerce businesses.
- [692] arXiv:2402.01881 (cross-list from cs.LG) [ pdf , ps , other ]
-
Title: Large Language Model Agent for Hyper-Parameter OptimizationSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Abstract: Hyperparameter optimization is critical in modern machine learning, requiring expert knowledge, numerous trials, and high computational and human resources. Despite the advancements in Automated Machine Learning (AutoML), challenges in terms of trial efficiency, setup complexity, and interoperability still persist. To address these issues, we introduce a novel paradigm leveraging Large Language Models (LLMs) to automate hyperparameter optimization across diverse machine learning tasks, which is named AgentHPO (short for LLM Agent-based Hyperparameter Optimization). Specifically, AgentHPO processes the task information autonomously, conducts experiments with specific hyperparameters (HPs), and iteratively optimizes them based on historical trials. This human-like optimization process largely reduces the number of required trials, simplifies the setup process, and enhances interpretability and user trust, compared to traditional AutoML methods. Extensive empirical experiments conducted on 12 representative machine-learning tasks indicate that AgentHPO not only matches but also often surpasses the best human trials in terms of performance while simultaneously providing explainable results. Further analysis sheds light on the strategies employed by the LLM in optimizing these tasks, highlighting its effectiveness and adaptability in various scenarios.
- [693] arXiv:2402.01886 (cross-list from cs.LG) [ pdf , ps , html , other ]
-
Title: Inverse Reinforcement Learning by Estimating Expertise of DemonstratorsComments: 12 pages, 3 figures, preprintSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Abstract: In Imitation Learning (IL), utilizing suboptimal and heterogeneous demonstrations presents a substantial challenge due to the varied nature of real-world data. However, standard IL algorithms consider these datasets as homogeneous, thereby inheriting the deficiencies of suboptimal demonstrators. Previous approaches to this issue typically rely on impractical assumptions like high-quality data subsets, confidence rankings, or explicit environmental knowledge. This paper introduces IRLEED, Inverse Reinforcement Learning by Estimating Expertise of Demonstrators, a novel framework that overcomes these hurdles without prior knowledge of demonstrator expertise. IRLEED enhances existing Inverse Reinforcement Learning (IRL) algorithms by combining a general model for demonstrator suboptimality to address reward bias and action variance, with a Maximum Entropy IRL framework to efficiently derive the optimal policy from diverse, suboptimal demonstrations. Experiments in both online and offline IL settings, with simulated and human-generated data, demonstrate IRLEED's adaptability and effectiveness, making it a versatile solution for learning from suboptimal demonstrations.
- [694] arXiv:2402.01909 (cross-list from cs.LG) [ pdf , ps , html , other ]
-
Title: On Catastrophic Inheritance of Large Foundation ModelsSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Computers and Society (cs.CY)
Abstract: Large foundation models (LFMs) are claiming incredible performances. Yet great concerns have been raised about their mythic and uninterpreted potentials not only in machine learning, but also in various other disciplines. In this position paper, we propose to identify a neglected issue deeply rooted in LFMs: Catastrophic Inheritance, describing the weaknesses and limitations inherited from biased large-scale pre-training data to behaviors of LFMs on the downstream tasks, including samples that are corrupted, long-tailed, noisy, out-of-distributed, to name a few. Such inheritance can potentially cause catastrophes to downstream applications, such as bias, lack of generalization, deteriorated performance, security vulnerability, privacy leakage, and value misalignment. We discuss the challenges behind this issue and propose UIM, a framework to Understand the catastrophic inheritance of LFMs from both pre-training and downstream adaptation, Interpret the implications of catastrophic inheritance on downstream tasks, and how to Mitigate it. UIM aims to unite both the machine learning and social sciences communities for more responsible and promising AI development and deployment.
- [695] arXiv:2402.01920 (cross-list from cs.LG) [ pdf , ps , html , other ]
-
Title: Preference Poisoning Attacks on Reward Model LearningSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
Abstract: Learning utility, or reward, models from pairwise comparisons is a fundamental component in a number of application domains. These approaches inherently entail collecting preference information from people, with feedback often provided anonymously. Since preferences are subjective, there is no gold standard to compare against; yet, reliance of high-impact systems on preference learning creates a strong motivation for malicious actors to skew data collected in this fashion to their ends. We investigate the nature and extent of this vulnerability systematically by considering a threat model in which an attacker can flip a small subset of preference comparisons with the goal of either promoting or demoting a target outcome. First, we propose two classes of algorithmic approaches for these attacks: a principled gradient-based framework, and several variants of rank-by-distance methods. Next, we demonstrate the efficacy of best attacks in both these classes in successfully achieving malicious goals on datasets from three diverse domains: autonomous control, recommendation system, and textual prompt-response preference learning. We find that the best attacks are often highly successful, achieving in the most extreme case 100% success rate with only 0.3% of the data poisoned. However, which attack is best can vary significantly across domains, demonstrating the value of our comprehensive vulnerability analysis that involves several classes of attack algorithms. In addition, we observe that the simpler and more scalable rank-by-distance approaches are often competitive with the best, and on occasion significantly outperform gradient-based methods. Finally, we show that several state-of-the-art defenses against other classes of poisoning attacks exhibit, at best, limited efficacy in our setting.
- [696] arXiv:2402.01922 (cross-list from cs.LG) [ pdf , ps , html , other ]
-
Title: A General Framework for Learning from Weak SupervisionHao Chen , Jindong Wang , Lei Feng , Xiang Li , Yidong Wang , Xing Xie , Masashi Sugiyama , Rita Singh , Bhiksha RajSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Abstract: Weakly supervised learning generally faces challenges in applicability to various scenarios with diverse weak supervision and in scalability due to the complexity of existing algorithms, thereby hindering the practical deployment. This paper introduces a general framework for learning from weak supervision (GLWS) with a novel algorithm. Central to GLWS is an Expectation-Maximization (EM) formulation, adeptly accommodating various weak supervision sources, including instance partial labels, aggregate statistics, pairwise observations, and unlabeled data. We further present an advanced algorithm that significantly simplifies the EM computational demands using a Non-deterministic Finite Automaton (NFA) along with a forward-backward algorithm, which effectively reduces time complexity from quadratic or factorial often required in existing solutions to linear scale. The problem of learning from arbitrary weak supervision is therefore converted to the NFA modeling of them. GLWS not only enhances the scalability of machine learning models but also demonstrates superior performance and versatility across 11 weak supervision scenarios. We hope our work paves the way for further advancements and practical deployment in this field.
- [697] arXiv:2402.01928 (cross-list from cs.LG) [ pdf , ps , html , other ]
-
Title: Robust Counterfactual Explanations in Machine Learning: A SurveySubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Abstract: Counterfactual explanations (CEs) are advocated as being ideally suited to providing algorithmic recourse for subjects affected by the predictions of machine learning models. While CEs can be beneficial to affected individuals, recent work has exposed severe issues related to the robustness of state-of-the-art methods for obtaining CEs. Since a lack of robustness may compromise the validity of CEs, techniques to mitigate this risk are in order. In this survey, we review works in the rapidly growing area of robust CEs and perform an in-depth analysis of the forms of robustness they consider. We also discuss existing solutions and their limitations, providing a solid foundation for future developments.
- [698] arXiv:2402.01955 (cross-list from cs.LG) [ pdf , ps , html , other ]
-
Title: OPSurv: Orthogonal Polynomials Quadrature Algorithm for Survival AnalysisLilian W. Bialokozowicz , Hoang M. Le , Tristan Sylvain , Peter A. I. Forsyth , Vineel Nagisetty , Greg MoriSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Functional Analysis (math.FA)
Abstract: This paper introduces the Orthogonal Polynomials Quadrature Algorithm for Survival Analysis (OPSurv), a new method providing time-continuous functional outputs for both single and competing risks scenarios in survival analysis. OPSurv utilizes the initial zero condition of the Cumulative Incidence function and a unique decomposition of probability densities using orthogonal polynomials, allowing it to learn functional approximation coefficients for each risk event and construct Cumulative Incidence Function estimates via Gauss--Legendre quadrature. This approach effectively counters overfitting, particularly in competing risks scenarios, enhancing model expressiveness and control. The paper further details empirical validations and theoretical justifications of OPSurv, highlighting its robust performance as an advancement in survival analysis with competing risks.
- [699] arXiv:2402.01968 (cross-list from cs.MA) [ pdf , ps , html , other ]
-
Title: A Survey on Context-Aware Multi-Agent Systems: Techniques, Challenges and Future DirectionsComments: 11 pages, 1 figureSubjects: Multiagent Systems (cs.MA) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: Research interest in autonomous agents is on the rise as an emerging topic. The notable achievements of Large Language Models (LLMs) have demonstrated the considerable potential to attain human-like intelligence in autonomous agents. However, the challenge lies in enabling these agents to learn, reason, and navigate uncertainties in dynamic environments. Context awareness emerges as a pivotal element in fortifying multi-agent systems when dealing with dynamic situations. Despite existing research focusing on both context-aware systems and multi-agent systems, there is a lack of comprehensive surveys outlining techniques for integrating context-aware systems with multi-agent systems. To address this gap, this survey provides a comprehensive overview of state-of-the-art context-aware multi-agent systems. First, we outline the properties of both context-aware systems and multi-agent systems that facilitate integration between these systems. Subsequently, we propose a general process for context-aware systems, with each phase of the process encompassing diverse approaches drawn from various application domains such as collision avoidance in autonomous driving, disaster relief management, utility management, supply chain management, human-AI interaction, and others. Finally, we discuss the existing challenges of context-aware multi-agent systems and provide future research directions in this field.
- [700] arXiv:2402.01981 (cross-list from cs.CL) [ pdf , ps , html , other ]
-
Title: Self-Debiasing Large Language Models: Zero-Shot Recognition and Reduction of StereotypesIsabel O. Gallegos , Ryan A. Rossi , Joe Barrow , Md Mehrab Tanjim , Tong Yu , Hanieh Deilamsalehy , Ruiyi Zhang , Sungchul Kim , Franck DernoncourtSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI); Computers and Society (cs.CY); Machine Learning (cs.LG)
Abstract: Large language models (LLMs) have shown remarkable advances in language generation and understanding but are also prone to exhibiting harmful social biases. While recognition of these behaviors has generated an abundance of bias mitigation techniques, most require modifications to the training data, model parameters, or decoding strategy, which may be infeasible without access to a trainable model. In this work, we leverage the zero-shot capabilities of LLMs to reduce stereotyping in a technique we introduce as zero-shot self-debiasing. With two approaches, self-debiasing via explanation and self-debiasing via reprompting, we show that self-debiasing can significantly reduce the degree of stereotyping across nine different social groups while relying only on the LLM itself and a simple prompt, with explanations correctly identifying invalid assumptions and reprompting delivering the greatest reductions in bias. We hope this work opens inquiry into other zero-shot techniques for bias mitigation.
- [701] arXiv:2402.01987 (cross-list from cs.LG) [ pdf , ps , html , other ]
-
Title: Online Transfer Learning for RSV Case DetectionYiming Sun , Yuhe Gao , Runxue Bao , Gregory F. Cooper , Jessi Espino , Harry Hochheiser , Marian G. Michaels , John M. Aronis , Chenxi Song , Ye YeComments: 10 pages, 2 figuresSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Abstract: Transfer learning has become a pivotal technique in machine learning and has proven to be effective in various real-world applications. However, utilizing this technique for classification tasks with sequential data often faces challenges, primarily attributed to the scarcity of class labels. To address this challenge, we introduce Multi-Source Adaptive Weighting (MSAW), an online multi-source transfer learning method. MSAW integrates a dynamic weighting mechanism into an ensemble framework, enabling automatic adjustment of weights based on the relevance and contribution of each source (representing historical knowledge) and target model (learning from newly acquired data). We demonstrate the effectiveness of MSAW by applying it to detect Respiratory Syncytial Virus cases within Emergency Department visits, utilizing multiple years of electronic health records from the University of Pittsburgh Medical Center. Our method demonstrates performance improvements over many baselines, including refining pre-trained models with online learning as well as three static weighting approaches, showing MSAW's capacity to integrate historical knowledge with progressively accumulated new data. This study indicates the potential of online transfer learning in healthcare, particularly for developing machine learning models that dynamically adapt to evolving situations where new data is incrementally accumulated.
- [702] arXiv:2402.01994 (cross-list from cs.HC) [ pdf , ps , html , other ]
-
Title: Human-Centered Privacy Research in the Age of Large Language ModelsComments: 4 pages, CHI EA'24Subjects: Human-Computer Interaction (cs.HC) ; Artificial Intelligence (cs.AI); Cryptography and Security (cs.CR)
Abstract: The emergence of large language models (LLMs), and their increased use in user-facing systems, has led to substantial privacy concerns. To date, research on these privacy concerns has been model-centered: exploring how LLMs lead to privacy risks like memorization, or can be used to infer personal characteristics about people from their content. We argue that there is a need for more research focusing on the human aspect of these privacy issues: e.g., research on how design paradigms for LLMs affect users' disclosure behaviors, users' mental models and preferences for privacy controls, and the design of tools, systems, and artifacts that empower end-users to reclaim ownership over their personal data. To build usable, efficient, and privacy-friendly systems powered by these models with imperfect privacy properties, our goal is to initiate discussions to outline an agenda for conducting human-centered research on privacy issues in LLM-powered systems. This Special Interest Group (SIG) aims to bring together researchers with backgrounds in usable security and privacy, human-AI collaboration, NLP, or any other related domains to share their perspectives and experiences on this problem, to help our community establish a collective understanding of the challenges, research opportunities, research methods, and strategies to collaborate with researchers outside of HCI.
- [703] arXiv:2402.01999 (cross-list from cs.LG) [ pdf , ps , html , other ]
-
Title: A Novel Hyperdimensional Computing Framework for Online Time Series Forecasting on the EdgeSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Abstract: In recent years, both online and offline deep learning models have been developed for time series forecasting. However, offline deep forecasting models fail to adapt effectively to changes in time-series data, while online deep forecasting models are often expensive and have complex training procedures. In this paper, we reframe the online nonlinear time-series forecasting problem as one of linear hyperdimensional time-series forecasting. Nonlinear low-dimensional time-series data is mapped to high-dimensional (hyperdimensional) spaces for linear hyperdimensional prediction, allowing fast, efficient and lightweight online time-series forecasting. Our framework, TSF-HD, adapts to time-series distribution shifts using a novel co-training framework for its hyperdimensional mapping and its linear hyperdimensional predictor. TSF-HD is shown to outperform the state of the art, while having reduced inference latency, for both short-term and long-term time series forecasting. Our code is publicly available at this http URL
- [704] arXiv:2402.02008 (cross-list from cs.CL) [ pdf , ps , html , other ]
-
Title: How well do LLMs cite relevant medical references? An evaluation framework and analysesKevin Wu , Eric Wu , Ally Cassasola , Angela Zhang , Kevin Wei , Teresa Nguyen , Sith Riantawan , Patricia Shi Riantawan , Daniel E. Ho , James ZouSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI)
Abstract: Large language models (LLMs) are currently being used to answer medical questions across a variety of clinical domains. Recent top-performing commercial LLMs, in particular, are also capable of citing sources to support their responses. In this paper, we ask: do the sources that LLMs generate actually support the claims that they make? To answer this, we propose three contributions. First, as expert medical annotations are an expensive and time-consuming bottleneck for scalable evaluation, we demonstrate that GPT-4 is highly accurate in validating source relevance, agreeing 88% of the time with a panel of medical doctors. Second, we develop an end-to-end, automated pipeline called \textit{SourceCheckup} and use it to evaluate five top-performing LLMs on a dataset of 1200 generated questions, totaling over 40K pairs of statements and sources. Interestingly, we find that between ~50% to 90% of LLM responses are not fully supported by the sources they provide. We also evaluate GPT-4 with retrieval augmented generation (RAG) and find that, even still, around 30\% of individual statements are unsupported, while nearly half of its responses are not fully supported. Third, we open-source our curated dataset of medical questions and expert annotations for future evaluations. Given the rapid pace of LLM development and the potential harms of incorrect or outdated medical information, it is crucial to also understand and quantify their capability to produce relevant, trustworthy medical references.
- [705] arXiv:2402.02023 (cross-list from cs.LG) [ pdf , ps , html , other ]
-
Title: Self-Supervised Contrastive Learning for Long-term ForecastingComments: Accepted at International Conference on Learning Representations (ICLR) 2024Subjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Abstract: Long-term forecasting presents unique challenges due to the time and memory complexity of handling long sequences. Existing methods, which rely on sliding windows to process long sequences, struggle to effectively capture long-term variations that are partially caught within the short window (i.e., outer-window variations). In this paper, we introduce a novel approach that overcomes this limitation by employing contrastive learning and enhanced decomposition architecture, specifically designed to focus on long-term variations. To this end, our contrastive loss incorporates global autocorrelation held in the whole time series, which facilitates the construction of positive and negative pairs in a self-supervised manner. When combined with our decomposition networks, our contrastive learning significantly improves long-term forecasting performance. Extensive experiments demonstrate that our approach outperforms 14 baseline models in multiple experiments over nine long-term benchmarks, especially in challenging scenarios that require a significantly long output for forecasting. Source code is available at this https URL .
- [706] arXiv:2402.02025 (cross-list from cs.LG) [ pdf , ps , other ]
-
Title: A Survey of Constraint Formulations in Safe Reinforcement LearningComments: Accepted at IJCAI-24 survey trackSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Abstract: Safety is critical when applying reinforcement learning (RL) to real-world problems. As a result, safe RL has emerged as a fundamental and powerful paradigm for optimizing an agent's policy while incorporating notions of safety. A prevalent safe RL approach is based on a constrained criterion, which seeks to maximize the expected cumulative reward subject to specific safety constraints. Despite recent effort to enhance safety in RL, a systematic understanding of the field remains difficult. This challenge stems from the diversity of constraint representations and little exploration of their interrelations. To bridge this knowledge gap, we present a comprehensive review of representative constraint formulations, along with a curated selection of algorithms designed specifically for each formulation. In addition, we elucidate the theoretical underpinnings that reveal the mathematical mutual relations among common problem formulations. We conclude with a discussion of the current state and future directions of safe reinforcement learning research.
- [707] arXiv:2402.02026 (cross-list from cs.CV) [ pdf , ps , html , other ]
-
Title: Multimodal-Enhanced Objectness Learner for Corner Case Detection in Autonomous DrivingComments: 7 pages,6 figuresSubjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI)
Abstract: Previous works on object detection have achieved high accuracy in closed-set scenarios, but their performance in open-world scenarios is not satisfactory. One of the challenging open-world problems is corner case detection in autonomous driving. Existing detectors struggle with these cases, relying heavily on visual appearance and exhibiting poor generalization ability. In this paper, we propose a solution by reducing the discrepancy between known and unknown classes and introduce a multimodal-enhanced objectness notion learner. Leveraging both vision-centric and image-text modalities, our semi-supervised learning framework imparts objectness knowledge to the student model, enabling class-aware detection. Our approach, Multimodal-Enhanced Objectness Learner (MENOL) for Corner Case Detection, significantly improves recall for novel classes with lower training costs. By achieving a 76.6% mAR-corner and 79.8% mAR-agnostic on the CODA-val dataset with just 5100 labeled training images, MENOL outperforms the baseline ORE by 71.3% and 60.6%, respectively. The code will be available at this https URL .
- [708] arXiv:2402.02029 (cross-list from cs.CV) [ pdf , ps , html , other ]
-
Title: ScribFormer: Transformer Makes CNN Work Better for Scribble-based Medical Image SegmentationZihan Li , Yuan Zheng , Dandan Shan , Shuzhou Yang , Qingde Li , Beizhan Wang , Yuanting Zhang , Qingqi Hong , Dinggang ShenComments: Accepted by IEEE Transactions on Medical Imaging (TMI)Subjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: Most recent scribble-supervised segmentation methods commonly adopt a CNN framework with an encoder-decoder architecture. Despite its multiple benefits, this framework generally can only capture small-range feature dependency for the convolutional layer with the local receptive field, which makes it difficult to learn global shape information from the limited information provided by scribble annotations. To address this issue, this paper proposes a new CNN-Transformer hybrid solution for scribble-supervised medical image segmentation called ScribFormer. The proposed ScribFormer model has a triple-branch structure, i.e., the hybrid of a CNN branch, a Transformer branch, and an attention-guided class activation map (ACAM) branch. Specifically, the CNN branch collaborates with the Transformer branch to fuse the local features learned from CNN with the global representations obtained from Transformer, which can effectively overcome limitations of existing scribble-supervised segmentation methods. Furthermore, the ACAM branch assists in unifying the shallow convolution features and the deep convolution features to improve model's performance further. Extensive experiments on two public datasets and one private dataset show that our ScribFormer has superior performance over the state-of-the-art scribble-supervised segmentation methods, and achieves even better results than the fully-supervised segmentation methods. The code is released at this https URL .
- [709] arXiv:2402.02042 (cross-list from cs.LG) [ pdf , ps , other ]
-
Title: Learning General Parameterized Policies for Infinite Horizon Average Reward Constrained MDPs via Primal-Dual Policy Gradient AlgorithmComments: We fixed Lemma 6 in v2 which changed the final resultSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Abstract: This paper explores the realm of infinite horizon average reward Constrained Markov Decision Processes (CMDP). To the best of our knowledge, this work is the first to delve into the regret and constraint violation analysis of average reward CMDPs with a general policy parametrization. To address this challenge, we propose a primal dual based policy gradient algorithm that adeptly manages the constraints while ensuring a low regret guarantee toward achieving a global optimal policy. In particular, we demonstrate that our proposed algorithm achieves $\tilde{\mathcal{O}}({T}^{4/5})$ objective regret and $\tilde{\mathcal{O}}({T}^{4/5})$ constraint violation bounds.
- [710] arXiv:2402.02043 (cross-list from cs.LG) [ pdf , ps , other ]
-
Title: A Plug-in Tiny AI Module for Intelligent and Selective Sensor Data TransmissionWenjun Huang , Arghavan Rezvani , Hanning Chen , Yang Ni , Sanggeon Yun , Sungheon Jeong , Mohsen ImaniComments: 14 pages, 6 figuresSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Networking and Internet Architecture (cs.NI)
Abstract: Applications in the Internet of Things (IoT) utilize machine learning to analyze sensor-generated data. However, a major challenge lies in the lack of targeted intelligence in current sensing systems, leading to vast data generation and increased computational and communication costs. To address this challenge, we propose a novel sensing module to equip sensing frameworks with intelligent data transmission capabilities by integrating a highly efficient machine learning model placed near the sensor. This model provides prompt feedback for the sensing system to transmit only valuable data while discarding irrelevant information by regulating the frequency of data transmission. The near-sensor model is quantized and optimized for real-time sensor control. To enhance the framework's performance, the training process is customized and a "lazy" sensor deactivation strategy utilizing temporal information is introduced. The suggested method is orthogonal to other IoT frameworks and can be considered as a plugin for selective data transmission. The framework is implemented, encompassing both software and hardware components. The experiments demonstrate that the framework utilizing the suggested module achieves over 85% system efficiency in terms of energy consumption and storage, with negligible impact on performance. This methodology has the potential to significantly reduce data output from sensors, benefiting a wide range of IoT applications.
- [711] arXiv:2402.02054 (cross-list from cs.LG) [ pdf , ps , html , other ]
-
Title: Neural Scaling Laws on GraphsSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Abstract: Deep graph models (e.g., graph neural networks and graph transformers) have become important techniques for leveraging knowledge across various types of graphs. Yet, the scaling properties of deep graph models have not been systematically investigated, casting doubt on the feasibility of achieving large graph models through enlarging the model and dataset sizes. In this work, we delve into neural scaling laws on graphs from both model and data perspectives. We first verify the validity of such laws on graphs, establishing formulations to describe the scaling behaviors. For model scaling, we investigate the phenomenon of scaling law collapse and identify overfitting as the potential reason. Moreover, we reveal that the model depth of deep graph models can impact the model scaling behaviors, which differ from observations in other domains such as CV and NLP. For data scaling, we suggest that the number of graphs can not effectively metric the graph data volume in scaling law since the sizes of different graphs are highly irregular. Instead, we reform the data scaling law with the number of edges as the metric to address the irregular graph sizes. We further demonstrate the reformed law offers a unified view of the data scaling behaviors for various fundamental graph tasks including node classification, link prediction, and graph classification. This work provides valuable insights into neural scaling laws on graphs, which can serve as an essential step toward large graph models.
- [712] arXiv:2402.02055 (cross-list from cs.LG) [ pdf , ps , html , other ]
-
Title: Variance Alignment Score: A Simple But Tough-to-Beat Data Selection Method for Multimodal Contrastive LearningComments: 17 pages, 4 figuresSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Abstract: In recent years, data selection has emerged as a core issue for large-scale visual-language model pretraining, especially on noisy web-curated datasets. One widely adopted strategy assigns quality scores such as CLIP similarity for each sample and retains the data pairs with the highest scores. However, these approaches are agnostic of data distribution and always fail to select the most informative samples. To solve this problem, we propose a simple yet theoretically principled metric named Variance Alignment Score (VAS), which has the form $\langle \Sigma_{\text{test}}, \Sigma_i\rangle$. Here, $\Sigma_{\text{test}}$ represents the target (cross-)covariance matrix we aim to align, potentially based on prior knowledge, while $\Sigma_i$ denotes the tensor product of single or multi-modal representations for the $i$-th sample. We further design a new data selection method that maximizes the total VAS. We provide theoretical analysis in a simplified setting to demonstrate the theoretical advantage of VAS over random or other existing data selection. Experimentally, applying VAS and CLIP scores together can outperform baselines by a margin of $1.3\%$ average on 38 evaluation sets for noisy dataset DataComp and $2.5\%$ on VTAB for high-quality dataset CC12M. Additionally, our ablation study also shows visual features are better than text for calculating VAS, and the related classical experimental design methods may fail under this context.
- [713] arXiv:2402.02056 (cross-list from cs.CL) [ pdf , ps , html , other ]
-
Title: AnthroScore: A Computational Linguistic Measure of AnthropomorphismComments: EACL 2024 Main ConferenceSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI); Computers and Society (cs.CY)
Abstract: Anthropomorphism, or the attribution of human-like characteristics to non-human entities, has shaped conversations about the impacts and possibilities of technology. We present AnthroScore, an automatic metric of implicit anthropomorphism in language. We use a masked language model to quantify how non-human entities are implicitly framed as human by the surrounding context. We show that AnthroScore corresponds with human judgments of anthropomorphism and dimensions of anthropomorphism described in social science literature. Motivated by concerns of misleading anthropomorphism in computer science discourse, we use AnthroScore to analyze 15 years of research papers and downstream news articles. In research papers, we find that anthropomorphism has steadily increased over time, and that papers related to language models have the most anthropomorphism. Within ACL papers, temporal increases in anthropomorphism are correlated with key neural advancements. Building upon concerns of scientific misinformation in mass media, we identify higher levels of anthropomorphism in news headlines compared to the research papers they cite. Since AnthroScore is lexicon-free, it can be directly applied to a wide range of text sources.
- [714] arXiv:2402.02066 (cross-list from cs.SI) [ pdf , ps , other ]
-
Title: Trustworthiness of $\mathbb{X}$ Users: A One-Class Classification ApproachSubjects: Social and Information Networks (cs.SI) ; Artificial Intelligence (cs.AI)
Abstract: $\mathbb{X}$ (formerly Twitter) is a prominent online social media platform that plays an important role in sharing information making the content generated on this platform a valuable source of information. Ensuring trust on $\mathbb{X}$ is essential to determine the user credibility and prevents issues across various domains. While assigning credibility to $\mathbb{X}$ users and classifying them as trusted or untrusted is commonly carried out using traditional machine learning models, there is limited exploration about the use of One-Class Classification (OCC) models for this purpose. In this study, we use various OCC models for $\mathbb{X}$ user classification. Additionally, we propose using a subspace-learning-based approach that simultaneously optimizes both the subspace and data description for OCC. We also introduce a novel regularization term for Subspace Support Vector Data Description (SSVDD), expressing data concentration in a lower-dimensional subspace that captures diverse graph structures. Experimental results show superior performance of the introduced regularization term for SSVDD compared to baseline models and state-of-the-art techniques for $\mathbb{X}$ user classification.
- [715] arXiv:2402.02079 (cross-list from cs.IR) [ pdf , ps , html , other ]
-
Title: Prototypical Contrastive Learning through Alignment and Uniformity for RecommendationSubjects: Information Retrieval (cs.IR) ; Artificial Intelligence (cs.AI)
Abstract: Graph Collaborative Filtering (GCF), one of the most widely adopted recommendation system methods, effectively captures intricate relationships between user and item interactions. Graph Contrastive Learning (GCL) based GCF has gained significant attention as it leverages self-supervised techniques to extract valuable signals from real-world scenarios. However, many methods usually learn the instances of discrimination tasks that involve the construction of contrastive pairs through random sampling. GCL approaches suffer from sampling bias issues, where the negatives might have a semantic structure similar to that of the positives, thus leading to a loss of effective feature representation. To address these problems, we present the \underline{Proto}typical contrastive learning through \underline{A}lignment and \underline{U}niformity for recommendation, which is called \textbf{ProtoAU}. Specifically, we first propose prototypes (cluster centroids) as a latent space to ensure consistency across different augmentations from the origin graph, aiming to eliminate the need for random sampling of contrastive pairs. Furthermore, the absence of explicit negatives means that directly optimizing the consistency loss between instance and prototype could easily result in dimensional collapse issues. Therefore, we propose aligning and maintaining uniformity in the prototypes of users and items as optimization objectives to prevent falling into trivial solutions. Finally, we conduct extensive experiments on four datasets and evaluate their performance on the task of link prediction. Experimental results demonstrate that the proposed ProtoAU outperforms other representative methods. The source codes of our proposed ProtoAU are available at \url{ this https URL }.
- [716] arXiv:2402.02085 (cross-list from cs.CV) [ pdf , ps , html , other ]
-
Title: DeCoF: Generated Video Detection via Frame ConsistencySubjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI)
Abstract: The escalating quality of video generated by advanced video generation methods leads to new security challenges in society, which makes generated video detection an urgent research priority. To foster collaborative research in this area, we construct the first open-source dataset explicitly for generated video detection, providing a valuable resource for the community to benchmark and improve detection methodologies. Through a series of carefully designed probe experiments, our study explores the significance of temporal and spatial artifacts in developing general and robust detectors for generated video. Based on the principle of video frame consistency, we introduce a simple yet effective detection model (DeCoF) that eliminates the impact of spatial artifacts during generalizing feature learning. Our extensive experiments demonstrate the efficacy of DeCoF in detecting videos produced by unseen video generation models and confirm its powerful generalization capabilities across several commercial proprietary models.
- [717] arXiv:2402.02094 (cross-list from cs.CV) [ pdf , ps , other ]
-
Title: Deep Semantic-Visual Alignment for Zero-Shot Remote Sensing Image Scene ClassificationComments: Published in ISPRS P&RS. The code is available at this https URLJournal-ref: ISPRS Journal of Photogrammetry and Remote Sensing, Volume 198, 2023, Pages 140-152Subjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI)
Abstract: Deep neural networks have achieved promising progress in remote sensing (RS) image classification, for which the training process requires abundant samples for each class. However, it is time-consuming and unrealistic to annotate labels for each RS category, given the fact that the RS target database is increasing dynamically. Zero-shot learning (ZSL) allows for identifying novel classes that are not seen during training, which provides a promising solution for the aforementioned problem. However, previous ZSL models mainly depend on manually-labeled attributes or word embeddings extracted from language models to transfer knowledge from seen classes to novel classes. Besides, pioneer ZSL models use convolutional neural networks pre-trained on ImageNet, which focus on the main objects appearing in each image, neglecting the background context that also matters in RS scene classification. To address the above problems, we propose to collect visually detectable attributes automatically. We predict attributes for each class by depicting the semantic-visual similarity between attributes and images. In this way, the attribute annotation process is accomplished by machine instead of human as in other methods. Moreover, we propose a Deep Semantic-Visual Alignment (DSVA) that take advantage of the self-attention mechanism in the transformer to associate local image regions together, integrating the background context information for prediction. The DSVA model further utilizes the attribute attention maps to focus on the informative image regions that are essential for knowledge transfer in ZSL, and maps the visual images into attribute space to perform ZSL classification. With extensive experiments, we show that our model outperforms other state-of-the-art models by a large margin on a challenging large-scale RS scene classification benchmark.
- [718] arXiv:2402.02097 (cross-list from cs.MA) [ pdf , ps , other ]
-
Title: Settling Decentralized Multi-Agent Coordinated Exploration by Novelty SharingComments: 17 pages, 15 figuresSubjects: Multiagent Systems (cs.MA) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: Exploration in decentralized cooperative multi-agent reinforcement learning faces two challenges. One is that the novelty of global states is unavailable, while the novelty of local observations is biased. The other is how agents can explore in a coordinated way. To address these challenges, we propose MACE, a simple yet effective multi-agent coordinated exploration method. By communicating only local novelty, agents can take into account other agents' local novelty to approximate the global novelty. Further, we newly introduce weighted mutual information to measure the influence of one agent's action on other agents' accumulated novelty. We convert it as an intrinsic reward in hindsight to encourage agents to exert more influence on other agents' exploration and boost coordinated exploration. Empirically, we show that MACE achieves superior performance in three multi-agent environments with sparse rewards.
- [719] arXiv:2402.02099 (cross-list from cs.CL) [ pdf , ps , other ]
-
Title: Analyzing the Evaluation of Cross-Lingual Knowledge Transfer in Multilingual Language ModelsComments: Accepted to EACL 2024Subjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: Recent advances in training multilingual language models on large datasets seem to have shown promising results in knowledge transfer across languages and achieve high performance on downstream tasks. However, we question to what extent the current evaluation benchmarks and setups accurately measure zero-shot cross-lingual knowledge transfer. In this work, we challenge the assumption that high zero-shot performance on target tasks reflects high cross-lingual ability by introducing more challenging setups involving instances with multiple languages. Through extensive experiments and analysis, we show that the observed high performance of multilingual models can be largely attributed to factors not requiring the transfer of actual linguistic knowledge, such as task- and surface-level knowledge. More specifically, we observe what has been transferred across languages is mostly data artifacts and biases, especially for low-resource languages. Our findings highlight the overlooked drawbacks of existing cross-lingual test data and evaluation setups, calling for a more nuanced understanding of the cross-lingual capabilities of multilingual models.
- [720] arXiv:2402.02101 (cross-list from cs.CL) [ pdf , ps , other ]
-
Title: Are Large Language Models Good Prompt Optimizers?Subjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI)
Abstract: LLM-based Automatic Prompt Optimization, which typically utilizes LLMs as Prompt Optimizers to self-reflect and refine prompts, has shown promising performance in recent studies. Despite the success, the underlying mechanism of this approach remains unexplored, and the true effectiveness of LLMs as Prompt Optimizers requires further validation. In this work, we conducted a comprehensive study to uncover the actual mechanism of LLM-based Prompt Optimization. Our findings reveal that the LLM optimizers struggle to identify the true causes of errors during reflection, tending to be biased by their own prior knowledge rather than genuinely reflecting on the errors. Furthermore, even when the reflection is semantically valid, the LLM optimizers often fail to generate appropriate prompts for the target models with a single prompt refinement step, partly due to the unpredictable behaviors of the target models. Based on the observations, we introduce a new "Automatic Behavior Optimization" paradigm, which directly optimizes the target model's behavior in a more controllable manner. We hope our study can inspire new directions for automatic prompt optimization development.
- [721] arXiv:2402.02110 (cross-list from cs.LG) [ pdf , ps , html , other ]
-
Title: Composite Active Learning: Towards Multi-Domain Active Learning with Theoretical GuaranteesJournal-ref: AAAI 2024Subjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Neural and Evolutionary Computing (cs.NE)
Abstract: Active learning (AL) aims to improve model performance within a fixed labeling budget by choosing the most informative data points to label. Existing AL focuses on the single-domain setting, where all data come from the same domain (e.g., the same dataset). However, many real-world tasks often involve multiple domains. For example, in visual recognition, it is often desirable to train an image classifier that works across different environments (e.g., different backgrounds), where images from each environment constitute one domain. Such a multi-domain AL setting is challenging for prior methods because they (1) ignore the similarity among different domains when assigning labeling budget and (2) fail to handle distribution shift of data across different domains. In this paper, we propose the first general method, dubbed composite active learning (CAL), for multi-domain AL. Our approach explicitly considers the domain-level and instance-level information in the problem; CAL first assigns domain-level budgets according to domain-level importance, which is estimated by optimizing an upper error bound that we develop; with the domain-level budgets, CAL then leverages a certain instance-level query strategy to select samples to label from each domain. Our theoretical analysis shows that our method achieves a better error bound compared to current AL methods. Our empirical results demonstrate that our approach significantly outperforms the state-of-the-art AL methods on both synthetic and real-world multi-domain datasets. Code is available at this https URL .
- [722] arXiv:2402.02135 (cross-list from cs.CL) [ pdf , ps , other ]
-
Title: Do Moral Judgment and Reasoning Capability of LLMs Change with Language? A Study using the Multilingual Defining Issues TestComments: Accepted to EACL 2024 (main)Subjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI)
Abstract: This paper explores the moral judgment and moral reasoning abilities exhibited by Large Language Models (LLMs) across languages through the Defining Issues Test. It is a well known fact that moral judgment depends on the language in which the question is asked. We extend the work of beyond English, to 5 new languages (Chinese, Hindi, Russian, Spanish and Swahili), and probe three LLMs -- ChatGPT, GPT-4 and Llama2Chat-70B -- that shows substantial multilingual text processing and generation abilities. Our study shows that the moral reasoning ability for all models, as indicated by the post-conventional score, is substantially inferior for Hindi and Swahili, compared to Spanish, Russian, Chinese and English, while there is no clear trend for the performance of the latter four languages. The moral judgments too vary considerably by the language.
- [723] arXiv:2402.02150 (cross-list from cs.CV) [ pdf , ps , other ]
-
Title: Data-Driven Prediction of Seismic Intensity Distributions Featuring Hybrid Classification-Regression ModelsSubjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI)
Abstract: Earthquakes are among the most immediate and deadly natural disasters that humans face. Accurately forecasting the extent of earthquake damage and assessing potential risks can be instrumental in saving numerous lives. In this study, we developed linear regression models capable of predicting seismic intensity distributions based on earthquake parameters: location, depth, and magnitude. Because it is completely data-driven, it can predict intensity distributions without geographical information. The dataset comprises seismic intensity data from earthquakes that occurred in the vicinity of Japan between 1997 and 2020, specifically containing 1,857 instances of earthquakes with a magnitude of 5.0 or greater, sourced from the Japan Meteorological Agency. We trained both regression and classification models and combined them to take advantage of both to create a hybrid model. The proposed model outperformed commonly used Ground Motion Prediction Equations (GMPEs) in terms of the correlation coefficient, F1 score, and MCC. Furthermore, the proposed model can predict even abnormal seismic intensity distributions, a task at conventional GMPEs often struggle.
- [724] arXiv:2402.02167 (cross-list from cs.HC) [ pdf , ps , other ]
-
Title: Vi(E)va LLM! A Conceptual Stack for Evaluating and Interpreting Generative AI-based VisualizationsSubjects: Human-Computer Interaction (cs.HC) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: The automatic generation of visualizations is an old task that, through the years, has shown more and more interest from the research and practitioner communities. Recently, large language models (LLM) have become an interesting option for supporting generative tasks related to visualization, demonstrating initial promising results. At the same time, several pitfalls, like the multiple ways of instructing an LLM to generate the desired result, the different perspectives leading the generation (code-based, image-based, grammar-based), and the presence of hallucinations even for the visualization generation task, make their usage less affordable than expected. Following similar initiatives for benchmarking LLMs, this paper copes with the problem of modeling the evaluation of a generated visualization through an LLM. We propose a theoretical evaluation stack, EvaLLM, that decomposes the evaluation effort in its atomic components, characterizes their nature, and provides an overview of how to implement and interpret them. We also designed and implemented an evaluation platform that provides a benchmarking resource for the visualization generation task. The platform supports automatic and manual scoring conducted by multiple assessors to support a fine-grained and semantic evaluation based on the EvaLLM stack. Two case studies on GPT3.5-turbo with Code Interpreter and Llama2-70-b models show the benefits of EvaLLM and illustrate interesting results on the current state-of-the-art LLM-generated visualizations.
- [725] arXiv:2402.02168 (cross-list from cs.LG) [ pdf , ps , other ]
-
Title: One Graph Model for Cross-domain Dynamic Link PredictionComments: Under reviewSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Social and Information Networks (cs.SI)
Abstract: This work proposes DyExpert, a dynamic graph model for cross-domain link prediction. It can explicitly model historical evolving processes to learn the evolution pattern of a specific downstream graph and subsequently make pattern-specific link predictions. DyExpert adopts a decode-only transformer and is capable of efficiently parallel training and inference by \textit{conditioned link generation} that integrates both evolution modeling and link prediction. DyExpert is trained by extensive dynamic graphs across diverse domains, comprising 6M dynamic edges. Extensive experiments on eight untrained graphs demonstrate that DyExpert achieves state-of-the-art performance in cross-domain link prediction. Compared to the advanced baseline under the same setting, DyExpert achieves an average of 11.40% improvement Average Precision across eight graphs. More impressive, it surpasses the fully supervised performance of 8 advanced baselines on 6 untrained graphs.
- [726] arXiv:2402.02181 (cross-list from cs.SI) [ pdf , ps , other ]
-
Title: An Ontology-Based multi-domain model in Social Network Analysis: Experimental validation and case studyJosé Alberto Benítez-Andrades , Isaías García-Rodríguez , Carmen Benavides , Héctor Aláiz-Moretón , José Emilio Labra GayoJournal-ref: Information Sciences, Volume 540, November 2020, Pages 390-413Subjects: Social and Information Networks (cs.SI) ; Artificial Intelligence (cs.AI)
Abstract: The use of social network theory and methods of analysis have been applied to different domains in recent years, including public health. The complete procedure for carrying out a social network analysis (SNA) is a time-consuming task that entails a series of steps in which the expert in social network analysis could make mistakes. This research presents a multi-domain knowledge model capable of automatically gathering data and carrying out different social network analyses in different domains, without errors and obtaining the same conclusions that an expert in SNA would obtain. The model is represented in an ontology called OntoSNAQA, which is made up of classes, properties and rules representing the domains of People, Questionnaires and Social Network Analysis. Besides the ontology itself, different rules are represented by SWRL and SPARQL queries. A Knowledge Based System was created using OntoSNAQA and applied to a real case study in order to show the advantages of the approach. Finally, the results of an SNA analysis obtained through the model were compared to those obtained from some of the most widely used SNA applications: UCINET, Pajek, Cytoscape and Gephi, to test and confirm the validity of the model.
- [727] arXiv:2402.02186 (cross-list from cs.LG) [ pdf , ps , other ]
-
Title: Evolution Guided Generative Flow NetworksComments: 16 pages, 16 figuesSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Abstract: Generative Flow Networks (GFlowNets) are a family of probabilistic generative models that learn to sample compositional objects proportional to their rewards. One big challenge of GFlowNets is training them effectively when dealing with long time horizons and sparse rewards. To address this, we propose Evolution guided generative flow networks (EGFN), a simple but powerful augmentation to the GFlowNets training using Evolutionary algorithms (EA). Our method can work on top of any GFlowNets training objective, by training a set of agent parameters using EA, storing the resulting trajectories in the prioritized replay buffer, and training the GFlowNets agent using the stored trajectories. We present a thorough investigation over a wide range of toy and real-world benchmark tasks showing the effectiveness of our method in handling long trajectories and sparse rewards.
- [728] arXiv:2402.02218 (cross-list from cs.CY) [ pdf , ps , other ]
-
Title: Machine Intelligence in Africa: a surveyComments: Accepted and to be presented at DSAI 2024Subjects: Computers and Society (cs.CY) ; Artificial Intelligence (cs.AI)
Abstract: In the last 5 years, the availability of large audio datasets in African countries has opened unlimited opportunities to build machine intelligence (MI) technologies that are closer to the people and speak, learn, understand, and do businesses in local languages, including for those who cannot read and write. Unfortunately, these audio datasets are not fully exploited by current MI tools, leaving several Africans out of MI business opportunities. Additionally, many state-of-the-art MI models are not culture-aware, and the ethics of their adoption indexes are questionable. The lack thereof is a major drawback in many applications in Africa. This paper summarizes recent developments in machine intelligence in Africa from a multi-layer multiscale and culture-aware ethics perspective, showcasing MI use cases in 54 African countries through 400 articles on MI research, industry, government actions, as well as uses in art, music, the informal economy, and small businesses in Africa. The survey also opens discussions on the reliability of MI rankings and indexes in the African continent as well as algorithmic definitions of unclear terms used in MI.
- [729] arXiv:2402.02230 (cross-list from cs.LG) [ pdf , ps , other ]
-
Title: Federated Learning with Differential PrivacyComments: Machine Learning (ML) & Federated Learning (FL); 4 pages, 3 figuresSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Distributed, Parallel, and Cluster Computing (cs.DC)
Abstract: Federated learning (FL), as a type of distributed machine learning, is capable of significantly preserving client's private data from being shared among different parties. Nevertheless, private information can still be divulged by analyzing uploaded parameter weights from clients. In this report, we showcase our empirical benchmark of the effect of the number of clients and the addition of differential privacy (DP) mechanisms on the performance of the model on different types of data. Our results show that non-i.i.d and small datasets have the highest decrease in performance in a distributed and differentially private setting.
- [730] arXiv:2402.02246 (cross-list from cs.CV) [ pdf , ps , html , other ]
-
Title: ExTTNet: A Deep Learning Algorithm for Extracting Table Texts from Invoice ImagesComments: 6 pages, 4 figures, 3 tablesSubjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI); Information Retrieval (cs.IR); Machine Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)
Abstract: In this work, product tables in invoices are obtained autonomously via a deep learning model, which is named as ExTTNet. Firstly, text is obtained from invoice images using Optical Character Recognition (OCR) techniques. Tesseract OCR engine [37] is used for this process. Afterwards, the number of existing features is increased by using feature extraction methods to increase the accuracy. Labeling process is done according to whether each text obtained as a result of OCR is a table element or not. In this study, a multilayer artificial neural network model is used. The training has been carried out with an Nvidia RTX 3090 graphics card and taken $162$ minutes. As a result of the training, the F1 score is $0.92$.
- [731] arXiv:2402.02258 (cross-list from cs.LG) [ pdf , ps , html , other ]
-
Title: XTSFormer: Cross-Temporal-Scale Transformer for Irregular Time Event PredictionTingsong Xiao , Zelin Xu , Wenchong He , Jim Su , Yupu Zhang , Raymond Opoku , Ronald Ison , Jason Petho , Jiang Bian , Patrick Tighe , Parisa Rashidi , Zhe JiangSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Abstract: Event prediction aims to forecast the time and type of a future event based on a historical event sequence. Despite its significance, several challenges exist, including the irregularity of time intervals between consecutive events, the existence of cycles, periodicity, and multi-scale event interactions, as well as the high computational costs for long event sequences. Existing neural temporal point processes (TPPs) methods do not capture the multi-scale nature of event interactions, which is common in many real-world applications such as clinical event data. To address these issues, we propose the cross-temporal-scale transformer (XTSFormer), designed specifically for irregularly timed event data. Our model comprises two vital components: a novel Feature-based Cycle-aware Time Positional Encoding (FCPE) that adeptly captures the cyclical nature of time, and a hierarchical multi-scale temporal attention mechanism. These scales are determined by a bottom-up clustering algorithm. Extensive experiments on several real-world datasets show that our XTSFormer outperforms several baseline methods in prediction performance.
- [732] arXiv:2402.02262 (cross-list from cs.CL) [ pdf , ps , html , other ]
-
Title: Data Quality Matters: Suicide Intention Detection on Social Media Posts Using a RoBERTa-CNN ModelComments: 4 pages, 1 figure, 4 tablesSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI)
Abstract: Suicide remains a global health concern for the field of health, which urgently needs innovative approaches for early detection and intervention. In this paper, we focus on identifying suicidal intentions in SuicideWatch Reddit posts and present a novel approach to suicide detection using the cutting-edge RoBERTa-CNN model, a variant of RoBERTa (Robustly optimized BERT approach). RoBERTa is used for various Natural Language Processing (NLP) tasks, including text classification and sentiment analysis. The effectiveness of the RoBERTa lies in its ability to capture textual information and form semantic relationships within texts. By adding the Convolution Neural Network (CNN) layer to the original model, the RoBERTa enhances its ability to capture important patterns from heavy datasets. To evaluate the RoBERTa-CNN, we experimented on the Suicide and Depression Detection dataset and obtained solid results. For example, RoBERTa-CNN achieves 98% mean accuracy with the standard deviation (STD) of 0.0009. It also reaches over 97.5% mean AUC value with an STD of 0.0013. In the meanwhile, RoBERTa-CNN outperforms competitive methods, demonstrating the robustness and ability to capture nuanced linguistic patterns for suicidal intentions. Therefore, RoBERTa-CNN can detect suicide intention on text data very well.
- [733] arXiv:2402.02263 (cross-list from cs.LG) [ pdf , ps , html , other ]
-
Title: MixedNUTS: Training-Free Accuracy-Robustness Balance via Nonlinearly Mixed ClassifiersSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
Abstract: Adversarial robustness often comes at the cost of degraded accuracy, impeding the real-life application of robust classification models. Training-based solutions for better trade-offs are limited by incompatibilities with already-trained high-performance large models, necessitating the exploration of training-free ensemble approaches. Observing that robust models are more confident in correct predictions than in incorrect ones on clean and adversarial data alike, we speculate amplifying this "benign confidence property" can reconcile accuracy and robustness in an ensemble setting. To achieve so, we propose "MixedNUTS", a training-free method where the output logits of a robust classifier and a standard non-robust classifier are processed by nonlinear transformations with only three parameters, which are optimized through an efficient algorithm. MixedNUTS then converts the transformed logits into probabilities and mixes them as the overall output. On CIFAR-10, CIFAR-100, and ImageNet datasets, experimental results with custom strong adaptive attacks demonstrate MixedNUTS's vastly improved accuracy and near-SOTA robustness -- it boosts CIFAR-100 clean accuracy by 7.86 points, sacrificing merely 0.87 points in robust accuracy.
- [734] arXiv:2402.02268 (cross-list from cs.LG) [ pdf , ps , html , other ]
-
Title: Federated Learning with New Knowledge: Fundamentals, Advances, and FuturesComments: 10 pagesSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Abstract: Federated Learning (FL) is a privacy-preserving distributed learning approach that is rapidly developing in an era where privacy protection is increasingly valued. It is this rapid development trend, along with the continuous emergence of new demands for FL in the real world, that prompts us to focus on a very important problem: Federated Learning with New Knowledge. The primary challenge here is to effectively incorporate various new knowledge into existing FL systems and evolve these systems to reduce costs, extend their lifespan, and facilitate sustainable development. In this paper, we systematically define the main sources of new knowledge in FL, including new features, tasks, models, and algorithms. For each source, we thoroughly analyze and discuss how to incorporate new knowledge into existing FL systems and examine the impact of the form and timing of new knowledge arrival on the incorporation process. Furthermore, we comprehensively discuss the potential future directions for FL with new knowledge, considering a variety of factors such as scenario setups, efficiency, and security. There is also a continuously updating repository for this topic: this https URL .
- [735] arXiv:2402.02285 (cross-list from cs.CL) [ pdf , ps , html , other ]
-
Title: SynthDST: Synthetic Data is All You Need for Few-Shot Dialog State TrackingAtharva Kulkarni , Bo-Hsiang Tseng , Joel Ruben Antony Moniz , Dhivya Piraviperumal , Hong Yu , Shruti BhargavaComments: 9 pages. 4 figures, EACL 2024 main conferenceSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: In-context learning with Large Language Models (LLMs) has emerged as a promising avenue of research in Dialog State Tracking (DST). However, the best-performing in-context learning methods involve retrieving and adding similar examples to the prompt, requiring access to labeled training data. Procuring such training data for a wide range of domains and applications is time-consuming, expensive, and, at times, infeasible. While zero-shot learning requires no training data, it significantly lags behind the few-shot setup. Thus, `\textit{Can we efficiently generate synthetic data for any dialogue schema to enable few-shot prompting?}' Addressing this question, we propose \method, a data generation framework tailored for DST, utilizing LLMs. Our approach only requires the dialogue schema and a few hand-crafted dialogue templates to synthesize natural, coherent, and free-flowing dialogues with DST annotations. Few-shot learning using data from {\method} results in $4-5%$ improvement in Joint Goal Accuracy over the zero-shot baseline on MultiWOZ 2.1 and 2.4. Remarkably, our few-shot learning approach recovers nearly $98%$ of the performance compared to the few-shot setup using human-annotated training data. Our synthetic data and code can be accessed at this https URL
- [736] arXiv:2402.02286 (cross-list from cs.CV) [ pdf , ps , html , other ]
-
Title: Multi-Level Aggregation and Recursive Alignment Architecture for Efficient Parallel Inference Segmentation NetworkComments: 15 pages, 9 figures and 12 Tables. Manuscript completed on April 30, 2022Subjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: Real-time semantic segmentation is a crucial research for real-world applications. However, many methods lay particular emphasis on reducing the computational complexity and model size, while largely sacrificing the accuracy. To tackle this problem, we propose a parallel inference network customized for semantic segmentation tasks to achieve a good trade-off between speed and accuracy. We employ a shallow backbone to ensure real-time speed, and propose three core components to compensate for the reduced model capacity to improve accuracy. Specifically, we first design a dual-pyramidal path architecture (Multi-level Feature Aggregation Module, MFAM) to aggregate multi-level features from the encoder to each scale, providing hierarchical clues for subsequent spatial alignment and corresponding in-network inference. Then, we build Recursive Alignment Module (RAM) by combining the flow-based alignment module with recursive upsampling architecture for accurate spatial alignment between multi-scale feature maps with half the computational complexity of the straightforward alignment method. Finally, we perform independent parallel inference on the aligned features to obtain multi-scale scores, and adaptively fuse them through an attention-based Adaptive Scores Fusion Module (ASFM) so that the final prediction can favor objects of multiple scales. Our framework shows a better balance between speed and accuracy than state-of-the-art real-time methods on Cityscapes and CamVid datasets. We also conducted systematic ablation studies to gain insight into our motivation and architectural design. Code is available at: this https URL .
- [737] arXiv:2402.02287 (cross-list from cs.LG) [ pdf , ps , other ]
-
Title: Future Directions in Foundations of Graph Machine LearningChristopher Morris , Nadav Dym , Haggai Maron , İsmail İlkan Ceylan , Fabrizio Frasca , Ron Levie , Derek Lim , Michael Bronstein , Martin Grohe , Stefanie JegelkaSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Discrete Mathematics (cs.DM); Neural and Evolutionary Computing (cs.NE); Machine Learning (stat.ML)
Abstract: Machine learning on graphs, especially using graph neural networks (GNNs), has seen a surge in interest due to the wide availability of graph data across a broad spectrum of disciplines, from life to social and engineering sciences. Despite their practical success, our theoretical understanding of the properties of GNNs remains highly incomplete. Recent theoretical advancements primarily focus on elucidating the coarse-grained expressive power of GNNs, predominantly employing combinatorial techniques. However, these studies do not perfectly align with practice, particularly in understanding the generalization behavior of GNNs when trained with stochastic first-order optimization techniques. In this position paper, we argue that the graph machine learning community needs to shift its attention to developing a more balanced theory of graph machine learning, focusing on a more thorough understanding of the interplay of expressive power, generalization, and optimization.
- [738] arXiv:2402.02289 (cross-list from cs.CL) [ pdf , ps , other ]
-
Title: SemPool: Simple, robust, and interpretable KG pooling for enhancing language modelsSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI)
Abstract: Knowledge Graph (KG) powered question answering (QA) performs complex reasoning over language semantics as well as knowledge facts. Graph Neural Networks (GNNs) learn to aggregate information from the underlying KG, which is combined with Language Models (LMs) for effective reasoning with the given question. However, GNN-based methods for QA rely on the graph information of the candidate answer nodes, which limits their effectiveness in more challenging settings where critical answer information is not included in the KG. We propose a simple graph pooling approach that learns useful semantics of the KG that can aid the LM's reasoning and that its effectiveness is robust under graph perturbations. Our method, termed SemPool, represents KG facts with pre-trained LMs, learns to aggregate their semantic information, and fuses it at different layers of the LM. Our experimental results show that SemPool outperforms state-of-the-art GNN-based methods by 2.27% accuracy points on average when answer information is missing from the KG. In addition, SemPool offers interpretability on what type of graph information is fused at different LM layers.
- [739] arXiv:2402.02314 (cross-list from cs.LG) [ pdf , ps , other ]
-
Title: Selecting Large Language Model to Fine-tune via Rectified Scaling LawHaowei Lin , Baizhou Huang , Haotian Ye , Qinyu Chen , Zihao Wang , Sujian Li , Jianzhu Ma , Xiaojun Wan , James Zou , Yitao LiangSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
Abstract: The ever-growing ecosystem of LLMs has posed a challenge in selecting the most appropriate pre-trained model to fine-tune amidst a sea of options. Given constrained resources, fine-tuning all models and making selections afterward is unrealistic. In this work, we formulate this resource-constrained selection task into predicting fine-tuning performance and illustrate its natural connection with scaling laws. Unlike pre-training, We find that the fine-tuning scaling curve includes not just the well-known "power phase" but also the previously unobserved "pre-power phase". We also explain why existing scaling laws fail to capture this phase transition phenomenon both theoretically and empirically. To address this, we introduce the concept of "pre-learned data size" into our rectified scaling law, which overcomes theoretical limitations and fits experimental results much better. By leveraging our law, we propose a novel LLM selection algorithm that selects the near-optimal model with hundreds of times less resource consumption, while other methods may provide negatively correlated selection.
- [740] arXiv:2402.02334 (cross-list from cs.LG) [ pdf , ps , html , other ]
-
Title: Arithmetic Feature Interaction Is Necessary for Deep Tabular LearningComments: 11 pages, 8 figures, to be published to AAAI2024Subjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Abstract: Until recently, the question of the effective inductive bias of deep models on tabular data has remained unanswered. This paper investigates the hypothesis that arithmetic feature interaction is necessary for deep tabular learning. To test this point, we create a synthetic tabular dataset with a mild feature interaction assumption and examine a modified transformer architecture enabling arithmetical feature interactions, referred to as AMFormer. Results show that AMFormer outperforms strong counterparts in fine-grained tabular data modeling, data efficiency in training, and generalization. This is attributed to its parallel additive and multiplicative attention operators and prompt-based optimization, which facilitate the separation of tabular samples in an extended space with arithmetically-engineered features. Our extensive experiments on real-world data also validate the consistent effectiveness, efficiency, and rationale of AMFormer, suggesting it has established a strong inductive bias for deep learning on tabular data. Code is available at this https URL .
- [741] arXiv:2402.02339 (cross-list from cs.CV) [ pdf , ps , html , other ]
-
Title: Uncertainty-Aware Testing-Time Optimization for 3D Human Pose EstimationSubjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: Although data-driven methods have achieved success in 3D human pose estimation, they often suffer from domain gaps and exhibit limited generalization. In contrast, optimization-based methods excel in fine-tuning for specific cases but are generally inferior to data-driven methods in overall performance. We observe that previous optimization-based methods commonly rely on projection constraint, which only ensures alignment in 2D space, potentially leading to the overfitting problem. To address this, we propose an Uncertainty-Aware testing-time Optimization (UAO) framework, which keeps the prior information of pre-trained model and alleviates the overfitting problem using the uncertainty of joints. Specifically, during the training phase, we design an effective 2D-to-3D network for estimating the corresponding 3D pose while quantifying the uncertainty of each 3D joint. For optimization during testing, the proposed optimization framework freezes the pre-trained model and optimizes only a latent state. Projection loss is then employed to ensure the generated poses are well aligned in 2D space for high-quality optimization. Furthermore, we utilize the uncertainty of each joint to determine how much each joint is allowed for optimization. The effectiveness and superiority of the proposed framework are validated through extensive experiments on two challenging datasets: Human3.6M and MPI-INF-3DHP. Notably, our approach outperforms the previous best result by a large margin of 4.5% on Human3.6M. Our source code will be open-sourced.
- [742] arXiv:2402.02342 (cross-list from cs.LG) [ pdf , ps , other ]
-
Title: MetaOptimize: A Framework for Optimizing Step Sizes and Other Meta-parametersSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Optimization and Control (math.OC)
Abstract: This paper addresses the challenge of optimizing meta-parameters (i.e., hyperparameters) in machine learning algorithms, a critical factor influencing training efficiency and model performance. Moving away from the computationally expensive traditional meta-parameter search methods, we introduce MetaOptimize framework that dynamically adjusts meta-parameters, particularly step sizes (also known as learning rates), during training. More specifically, MetaOptimize can wrap around any first-order optimization algorithm, tuning step sizes on the fly to minimize a specific form of regret that accounts for long-term effect of step sizes on training, through a discounted sum of future losses. We also introduce low complexity variants of MetaOptimize that, in conjunction with its adaptability to multiple optimization algorithms, demonstrate performance competitive to those of best hand-crafted learning rate schedules across various machine learning applications.
- [743] arXiv:2402.02345 (cross-list from cs.LG) [ pdf , ps , other ]
-
Title: Stereographic Spherical Sliced Wasserstein DistancesHuy Tran , Yikun Bai , Abihith Kothapalli , Ashkan Shahbazi , Xinran Liu , Rocio Diaz Martin , Soheil KolouriSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)
Abstract: Comparing spherical probability distributions is of great interest in various fields, including geology, medical domains, computer vision, and deep representation learning. The utility of optimal transport-based distances, such as the Wasserstein distance, for comparing probability measures has spurred active research in developing computationally efficient variations of these distances for spherical probability measures. This paper introduces a high-speed and highly parallelizable distance for comparing spherical measures using the stereographic projection and the generalized Radon transform, which we refer to as the Stereographic Spherical Sliced Wasserstein (S3W) distance. We carefully address the distance distortion caused by the stereographic projection and provide an extensive theoretical analysis of our proposed metric and its rotationally invariant variation. Finally, we evaluate the performance of the proposed metrics and compare them with recent baselines in terms of both speed and accuracy through a wide range of numerical studies, including gradient flows and self-supervised learning.
- [744] arXiv:2402.02362 (cross-list from cs.LG) [ pdf , ps , html , other ]
-
Title: Unification of Symmetries Inside Neural Networks: Transformer, Feedforward and Neural ODEComments: 11 pages, 3 figuresSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); High Energy Physics - Theory (hep-th); Computational Physics (physics.comp-ph)
Abstract: Understanding the inner workings of neural networks, including transformers, remains one of the most challenging puzzles in machine learning. This study introduces a novel approach by applying the principles of gauge symmetries, a key concept in physics, to neural network architectures. By regarding model functions as physical observables, we find that parametric redundancies of various machine learning models can be interpreted as gauge symmetries. We mathematically formulate the parametric redundancies in neural ODEs, and find that their gauge symmetries are given by spacetime diffeomorphisms, which play a fundamental role in Einstein's theory of gravity. Viewing neural ODEs as a continuum version of feedforward neural networks, we show that the parametric redundancies in feedforward neural networks are indeed lifted to diffeomorphisms in neural ODEs. We further extend our analysis to transformer models, finding natural correspondences with neural ODEs and their gauge symmetries. The concept of gauge symmetries sheds light on the complex behavior of deep learning models through physics and provides us with a unifying perspective for analyzing various machine learning architectures.
- [745] arXiv:2402.02364 (cross-list from cs.LG) [ pdf , ps , other ]
-
Title: The Developmental Landscape of In-Context LearningSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
Abstract: We show that in-context learning emerges in transformers in discrete developmental stages, when they are trained on either language modeling or linear regression tasks. We introduce two methods for detecting the milestones that separate these stages, by probing the geometry of the population loss in both parameter space and function space. We study the stages revealed by these new methods using a range of behavioral and structural metrics to establish their validity.
- [746] arXiv:2402.02367 (cross-list from cs.CV) [ pdf , ps , html , other ]
-
Title: Exploring Intrinsic Properties of Medical Images for Self-Supervised Binary Semantic SegmentationComments: 30 pages, 10 figures, and 10 tables. Under ReviewSubjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI)
Abstract: Recent advancements in self-supervised learning have unlocked the potential to harness unlabeled data for auxiliary tasks, facilitating the learning of beneficial priors. This has been particularly advantageous in fields like medical image analysis, where labeled data are scarce. Although effective for classification tasks, this methodology has shown limitations in more complex applications, such as medical image segmentation. In this paper, we introduce Medical imaging Enhanced with Dynamic Self-Adaptive Semantic Segmentation (MedSASS), a dedicated self-supervised framework tailored for medical image segmentation. We evaluate MedSASS against existing state-of-the-art methods across four diverse medical datasets, showcasing its superiority. MedSASS outperforms existing CNN-based self-supervised methods by 3.83% and matches the performance of ViT-based methods. Furthermore, when MedSASS is trained end-to-end, covering both encoder and decoder, it demonstrates significant improvements of 14.4% for CNNs and 6% for ViT-based architectures compared to existing state-of-the-art self-supervised strategies.
- [747] arXiv:2402.02380 (cross-list from cs.CL) [ pdf , ps , other ]
-
Title: Evaluating Large Language Models in Analysing Classroom DialogueSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI); Human-Computer Interaction (cs.HC)
Abstract: This study explores the application of Large Language Models (LLMs), specifically GPT-4, in the analysis of classroom dialogue, a crucial research task for both teaching diagnosis and quality improvement. Recognizing the knowledge-intensive and labor-intensive nature of traditional qualitative methods in educational research, this study investigates the potential of LLM to streamline and enhance the analysis process. The study involves datasets from a middle school, encompassing classroom dialogues across mathematics and Chinese classes. These dialogues were manually coded by educational experts and then analyzed using a customised GPT-4 model. This study focuses on comparing manual annotations with the outputs of GPT-4 to evaluate its efficacy in analyzing educational dialogues. Time efficiency, inter-coder agreement, and inter-coder reliability between human coders and GPT-4 are evaluated. Results indicate substantial time savings with GPT-4, and a high degree of consistency in coding between the model and human coders, with some discrepancies in specific codes. These findings highlight the strong potential of LLM in teaching evaluation and facilitation.
- [748] arXiv:2402.02381 (cross-list from cs.NI) [ pdf , ps , other ]
-
Title: Empowering Computing and Networks Convergence System with Distributed Cooperative RoutingComments: Submit to IEEE NetworkSubjects: Networking and Internet Architecture (cs.NI) ; Artificial Intelligence (cs.AI)
Abstract: The emergence of intelligent applications and recent advances in the fields of computing and networks are driving the development of computing and networks convergence (CNC) system. However, existing researches failed to achieve comprehensive scheduling optimization of computing and network resources. This shortfall results in some requirements of computing requests unable to be guaranteed in an end-to-end service pattern, negatively impacting the development of CNC systems. In this article, we propose a distributed cooperative routing framework for the CNC system to ensure the deadline requirements and minimize the computation cost of requests. The framework includes trading plane, management plane, control plane and forwarding plane. The cross-plane cooperative end-to-end routing schemes consider both computation efficiency of heterogeneous servers and the network congestion degrees while making routing plan, thereby determining where to execute requests and corresponding routing paths. Simulations results substantiates the performance of our routing schemes in scheduling computing requests in the CNC system.
- [749] arXiv:2402.02385 (cross-list from cs.RO) [ pdf , ps , other ]
-
Title: A Survey on Robotics with Foundation Models: toward Embodied AISubjects: Robotics (cs.RO) ; Artificial Intelligence (cs.AI)
Abstract: While the exploration for embodied AI has spanned multiple decades, it remains a persistent challenge to endow agents with human-level intelligence, including perception, learning, reasoning, decision-making, control, and generalization capabilities, so that they can perform general-purpose tasks in open, unstructured, and dynamic environments. Recent advances in computer vision, natural language processing, and multi-modality learning have shown that the foundation models have superhuman capabilities for specific tasks. They not only provide a solid cornerstone for integrating basic modules into embodied AI systems but also shed light on how to scale up robot learning from a methodological perspective. This survey aims to provide a comprehensive and up-to-date overview of foundation models in robotics, focusing on autonomous manipulation and encompassing high-level planning and low-level control. Moreover, we showcase their commonly used datasets, simulators, and benchmarks. Importantly, we emphasize the critical challenges intrinsic to this field and delineate potential avenues for future research, contributing to advancing the frontier of academic and industrial discourse.
- [750] arXiv:2402.02388 (cross-list from cs.CL) [ pdf , ps , other ]
-
Title: Solution-oriented Agent-based Models Generation with Verifier-assisted Iterative In-context LearningJournal-ref: International Conference on Autonomous Agents and Multiagent Systems 2024Subjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Software Engineering (cs.SE)
Abstract: Agent-based models (ABMs) stand as an essential paradigm for proposing and validating hypothetical solutions or policies aimed at addressing challenges posed by complex systems and achieving various objectives. This process demands labor-intensive endeavors and multidisciplinary expertise. Large language models (LLMs) encapsulating cross-domain knowledge and programming proficiency could potentially alleviate the difficulty of this process. However, LLMs excel in handling sequential information, making it challenging for analyzing the intricate interactions and nonlinear dynamics inherent in ABMs. Additionally, due to the lack of self-evaluation capability of LLMs, relying solely on LLMs is insufficient to effectively accomplish this process. In this paper, we present SAGE, a general solution-oriented ABM generation framework designed for automatic modeling and generating solutions for targeted problems. Unlike approaches reliant on expert handcrafting or resource-intensive neural network training, SAGE establishes a verifier-assisted iterative in-context learning process employing large language models (LLMs) to leverages their inherent cross-domain knowledge for tackling intricate demands from diverse domain scenarios. In SAGE, we introduce an semi-structured conceptual representation expliciting the intricate structures of ABMs and an objective representation to guide LLMs in modeling scenarios and proposing hypothetical solutions through in-context learning. To ensure the model executability and solution feasibility, SAGE devises a two-level verifier with chain-of-thought prompting tailored to the complex interactions and non-linear dynamics of ABMs, driving the iterative generation optimization. Moreover, we construct an evaluation dataset of solution-oriented ABMs from open this http URL contains practical models across various domains.
- [751] arXiv:2402.02389 (cross-list from cs.CL) [ pdf , ps , html , other ]
-
Title: KICGPT: Large Language Model with Knowledge in Context for Knowledge Graph CompletionComments: Accepted to EMNLP 2023 FindingsSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI)
Abstract: Knowledge Graph Completion (KGC) is crucial for addressing knowledge graph incompleteness and supporting downstream applications. Many models have been proposed for KGC. They can be categorized into two main classes: triple-based and text-based approaches. Triple-based methods struggle with long-tail entities due to limited structural information and imbalanced entity distributions. Text-based methods alleviate this issue but require costly training for language models and specific finetuning for knowledge graphs, which limits their efficiency. To alleviate these limitations, in this paper, we propose KICGPT, a framework that integrates a large language model (LLM) and a triple-based KGC retriever. It alleviates the long-tail problem without incurring additional training overhead. KICGPT uses an in-context learning strategy called Knowledge Prompt, which encodes structural knowledge into demonstrations to guide the LLM. Empirical results on benchmark datasets demonstrate the effectiveness of KICGPT with smaller training overhead and no finetuning.
- [752] arXiv:2402.02399 (cross-list from cs.LG) [ pdf , ps , other ]
-
Title: FreDF: Learning to Forecast in Frequency DomainHao Wang , Licheng Pan , Zhichao Chen , Degui Yang , Sen Zhang , Yifei Yang , Xinggao Liu , Haoxuan Li , Dacheng TaoSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Applications (stat.AP); Machine Learning (stat.ML)
Abstract: Time series modeling is uniquely challenged by the presence of autocorrelation in both historical and label sequences. Current research predominantly focuses on handling autocorrelation within the historical sequence but often neglects its presence in the label sequence. Specifically, emerging forecast models mainly conform to the direct forecast (DF) paradigm, generating multi-step forecasts under the assumption of conditional independence within the label sequence. This assumption disregards the inherent autocorrelation in the label sequence, thereby limiting the performance of DF-based models. In response to this gap, we introduce the Frequency-enhanced Direct Forecast (FreDF), which bypasses the complexity of label autocorrelation by learning to forecast in the frequency domain. Our experiments demonstrate that FreDF substantially outperforms existing state-of-the-art methods including iTransformer and is compatible with a variety of forecast models.
- [753] arXiv:2402.02401 (cross-list from cs.CV) [ pdf , ps , other ]
-
Title: AI-Generated Content Enhanced Computer-Aided Diagnosis Model for Thyroid Nodules: A ChatGPT-Style AssistantJincao Yao (1 and 2 and 3 and 4 and 5 and 6), Yunpeng Wang (7), Zhikai Lei (8), Kai Wang (9), Xiaoxian Li (10) Jianhua Zhou (10), Xiang Hao (7), Jiafei Shen (1 and 2), Zhenping Wang (9), Rongrong Ru (11), Yaqing Chen (11), Yahan Zhou (6), Chen Chen (1 and 2), Yanming Zhang (12 and 13), Ping Liang (14), Dong Xu (1 and 2 and 3 and 4 and 5 and 6) ((1) Department of Radiology, Zhejiang Cancer Hospital, Hangzhou, 310022, China (2) Hangzhou Institute of Medicine (HIM), Chinese Academy of Sciences, Hangzhou, 310000, China,(3) Key Laboratory of Head and Neck Cancer Translational Research of Zhejiang Province, Hangzhou, 310022, China,(4) Zhejiang Provincial Research Center for Cancer Intelligent Diagnosis and Molecular Technology, Hangzhou, 310000, China, (5) Wenling Medical Big Data and Artificial Intelligence Research Institute, 24th Floor, Machang Road, Taizhou, 310061, China,(6) Taizhou Key Laboratory of Minimally Invasive Interventional Therapy and Artificial Intelligence, Taizhou Campus of Zhejiang Cancer Hospital (Taizhou Cancer Hospital), Taizhou, 317502, China,(7) College of Optical Science and Engineering, Zhejiang University, No.38 of Zheda Road, Hangzhou, Zhejiang Province, China,(8) Zhejiang Provincial Hospital of Chinese Medicine, 54 Youdian Road, Hangzhou, 310003, China,(9) Department of Ultrasound, The Affiliated Dongyang Hospital of Wenzhou Medical University, Dongyang, 322100, China,(10) Department of Ultrasound, Sun Yat sen University Cancer Center, State Key Laboratory of Oncology in South China, Collaborative Innovation Center for Cancer Medicine, Guangzhou, 510060, China, (11) Affiliated Xiaoshan Hospital, Hangzhou Normal University, No.728 North Yucai Road, Hangzhou, 311202, China,(12) Zhejiang Provincial People's Hospital Affiliated People's Hospital, Hangzhou Medical College, Hangzhou, 314408, China,(13) Key Laboratory of Endocrine Gland Diseases of Zhejiang Province, Hangzhou, 314408, China,(14) Department of Ultrasound, Chinese PLA General Hospital, Chinese PLA Medical School, Beijing, 100853, China)Subjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI)
Abstract: An artificial intelligence-generated content-enhanced computer-aided diagnosis (AIGC-CAD) model, designated as ThyGPT, has been developed. This model, inspired by the architecture of ChatGPT, could assist radiologists in assessing the risk of thyroid nodules through semantic-level human-machine interaction. A dataset comprising 19,165 thyroid nodule ultrasound cases from Zhejiang Cancer Hospital was assembled to facilitate the training and validation of the model. After training, ThyGPT could automatically evaluate thyroid nodule and engage in effective communication with physicians through human-computer interaction. The performance of ThyGPT was rigorously quantified using established metrics such as the receiver operating characteristic (ROC) curve, area under the curve (AUC), sensitivity, and specificity. The empirical findings revealed that radiologists, when supplemented with ThyGPT, markedly surpassed the diagnostic acumen of their peers utilizing traditional methods as well as the performance of the model in isolation. These findings suggest that AIGC-CAD systems, exemplified by ThyGPT, hold the promise to fundamentally transform the diagnostic workflows of radiologists in forthcoming years.
- [754] arXiv:2402.02416 (cross-list from cs.CL) [ pdf , ps , html , other ]
-
Title: Aligner: Achieving Efficient Alignment through Weak-to-Strong CorrectionJiaming Ji , Boyuan Chen , Hantao Lou , Donghai Hong , Borong Zhang , Xuehai Pan , Juntao Dai , Yaodong YangComments: 34 pagesSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: Efforts to align Large Language Models (LLMs) are mainly conducted via Reinforcement Learning from Human Feedback (RLHF) methods. However, RLHF encounters major challenges including training reward models, actor-critic engineering, and importantly, it requires access to LLM parameters. Here we introduce Aligner, a new efficient alignment paradigm that bypasses the whole RLHF process by learning the correctional residuals between the aligned and the unaligned answers. Our Aligner offers several key advantages. Firstly, it is an autoregressive seq2seq model that is trained on the query-answer-correction dataset via supervised learning; this offers a parameter-efficient alignment solution with minimal resources. Secondly, the Aligner facilitates weak-to-strong generalization; finetuning large pretrained models by Aligner's supervisory signals demonstrates strong performance boost. Thirdly, Aligner functions as a model-agnostic plug-and-play module, allowing for its direct application on different open-source and API-based models. Remarkably, Aligner-7B improves 11 different LLMs by 21.9% in helpfulness and 23.8% in harmlessness on average (GPT-4 by 17.5% and 26.9%). When finetuning (strong) Llama2-70B with (weak) Aligner-13B's supervision, we can improve Llama2 by 8.2% in helpfulness and 61.6% in harmlessness. See our dataset and code at this https URL
- [755] arXiv:2402.02420 (cross-list from cs.CL) [ pdf , ps , html , other ]
-
Title: Factuality of Large Language Models in the Year 2024Yuxia Wang , Minghan Wang , Muhammad Arslan Manzoor , Fei Liu , Georgi Georgiev , Rocktim Jyoti Das , Preslav NakovComments: 9 pages, 1 figure and 2 tablesSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI)
Abstract: Large language models (LLMs), especially when instruction-tuned for chat, have become part of our daily lives, freeing people from the process of searching, extracting, and integrating information from multiple sources by offering a straightforward answer to a variety of questions in a single place. Unfortunately, in many cases, LLM responses are factually incorrect, which limits their applicability in real-world scenarios. As a result, research on evaluating and improving the factuality of LLMs has attracted a lot of research attention recently. In this survey, we critically analyze existing work with the aim to identify the major challenges and their associated causes, pointing out to potential solutions for improving the factuality of LLMs, and analyzing the obstacles to automated factuality evaluation for open-ended text generation. We further offer an outlook on where future research should go.
- [756] arXiv:2402.02423 (cross-list from cs.LG) [ pdf , ps , other ]
-
Title: Uni-RLHF: Universal Platform and Benchmark Suite for Reinforcement Learning with Diverse Human FeedbackYifu Yuan , Jianye Hao , Yi Ma , Zibin Dong , Hebin Liang , Jinyi Liu , Zhixin Feng , Kai Zhao , Yan ZhengComments: Published as a conference paper at ICLR 2024. The website is available at this https URLSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Human-Computer Interaction (cs.HC); Robotics (cs.RO)
Abstract: Reinforcement Learning with Human Feedback (RLHF) has received significant attention for performing tasks without the need for costly manual reward design by aligning human preferences. It is crucial to consider diverse human feedback types and various learning methods in different environments. However, quantifying progress in RLHF with diverse feedback is challenging due to the lack of standardized annotation platforms and widely used unified benchmarks. To bridge this gap, we introduce Uni-RLHF, a comprehensive system implementation tailored for RLHF. It aims to provide a complete workflow from real human feedback, fostering progress in the development of practical problems. Uni-RLHF contains three packages: 1) a universal multi-feedback annotation platform, 2) large-scale crowdsourced feedback datasets, and 3) modular offline RLHF baseline implementations. Uni-RLHF develops a user-friendly annotation interface tailored to various feedback types, compatible with a wide range of mainstream RL environments. We then establish a systematic pipeline of crowdsourced annotations, resulting in large-scale annotated datasets comprising more than 15 million steps across 30+ popular tasks. Through extensive experiments, the results in the collected datasets demonstrate competitive performance compared to those from well-designed manual rewards. We evaluate various design choices and offer insights into their strengths and potential areas of improvement. We wish to build valuable open-source platforms, datasets, and baselines to facilitate the development of more robust and reliable RLHF solutions based on realistic human feedback. The website is available at this https URL .
- [757] arXiv:2402.02439 (cross-list from cs.LG) [ pdf , ps , html , other ]
-
Title: DiffStitch: Boosting Offline Reinforcement Learning with Diffusion-based Trajectory StitchingSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Abstract: In offline reinforcement learning (RL), the performance of the learned policy highly depends on the quality of offline datasets. However, in many cases, the offline dataset contains very limited optimal trajectories, which poses a challenge for offline RL algorithms as agents must acquire the ability to transit to high-reward regions. To address this issue, we introduce Diffusion-based Trajectory Stitching (DiffStitch), a novel diffusion-based data augmentation pipeline that systematically generates stitching transitions between trajectories. DiffStitch effectively connects low-reward trajectories with high-reward trajectories, forming globally optimal trajectories to address the challenges faced by offline RL algorithms. Empirical experiments conducted on D4RL datasets demonstrate the effectiveness of DiffStitch across RL methodologies. Notably, DiffStitch demonstrates substantial enhancements in the performance of one-step methods (IQL), imitation learning methods (TD3+BC), and trajectory optimization methods (DT).
- [758] arXiv:2402.02441 (cross-list from cs.LG) [ pdf , ps , html , other ]
-
Title: TopoX: A Suite of Python Packages for Machine Learning on Topological DomainsMustafa Hajij , Mathilde Papillon , Florian Frantzen , Jens Agerberg , Ibrahem AlJabea , Ruben Ballester , Claudio Battiloro , Guillermo Bernárdez , Tolga Birdal , Aiden Brent , Peter Chin , Sergio Escalera , Simone Fiorellino , Odin Hoff Gardaa , Gurusankar Gopalakrishnan , Devendra Govil , Josef Hoppe , Maneel Reddy Karri , Jude Khouja , Manuel Lecha , Neal Livesay , Jan Meißner , Soham Mukherjee , Alexander Nikitin , Theodore Papamarkou , Jaro Prílepok , Karthikeyan Natesan Ramamurthy , Paul Rosen , Aldo Guzmán-Sáenz , Alessandro Salatiello , Shreyas N. Samaga , Simone Scardapane , Michael T. Schaub , Luca Scofano , Indro Spinelli , Lev Telyatnikov , Quang Truong , Robin Walters , Maosheng Yang , Olga Zaghen , Ghada Zamzmi , Ali Zia , Nina MiolaneSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Mathematical Software (cs.MS); Computation (stat.CO)
Abstract: We introduce TopoX, a Python software suite that provides reliable and user-friendly building blocks for computing and machine learning on topological domains that extend graphs: hypergraphs, simplicial, cellular, path and combinatorial complexes. TopoX consists of three packages: TopoNetX facilitates constructing and computing on these domains, including working with nodes, edges and higher-order cells; TopoEmbedX provides methods to embed topological domains into vector spaces, akin to popular graph-based embedding algorithms such as node2vec; TopoModelx is built on top of PyTorch and offers a comprehensive toolbox of higher-order message passing functions for neural networks on topological domains. The extensively documented and unit-tested source code of TopoX is available under MIT license at this https URL .
- [759] arXiv:2402.02452 (cross-list from cs.CR) [ pdf , ps , html , other ]
-
Title: XAI-CF -- Examining the Role of Explainable Artificial Intelligence in Cyber ForensicsSubjects: Cryptography and Security (cs.CR) ; Artificial Intelligence (cs.AI)
Abstract: With the rise of complex cyber devices Cyber Forensics (CF) is facing many new challenges. For example, there are dozens of systems running on smartphones, each with more than millions of downloadable applications. Sifting through this large amount of data and making sense requires new techniques, such as from the field of Artificial Intelligence (AI). To apply these techniques successfully in CF, we need to justify and explain the results to the stakeholders of CF, such as forensic analysts and members of the court, for them to make an informed decision. If we want to apply AI successfully in CF, there is a need to develop trust in AI systems. Some other factors in accepting the use of AI in CF are to make AI authentic, interpretable, understandable, and interactive. This way, AI systems will be more acceptable to the public and ensure alignment with legal standards. An explainable AI (XAI) system can play this role in CF, and we call such a system XAI-CF. XAI-CF is indispensable and is still in its infancy. In this paper, we explore and make a case for the significance and advantages of XAI-CF. We strongly emphasize the need to build a successful and practical XAI-CF system and discuss some of the main requirements and prerequisites of such a system. We present a formal definition of the terms CF and XAI-CF and a comprehensive literature review of previous works that apply and utilize XAI to build and increase trust in CF. We discuss some challenges facing XAI-CF. We also provide some concrete solutions to these challenges. We identify key insights and future research directions for building XAI applications for CF. This paper is an effort to explore and familiarize the readers with the role of XAI applications in CF, and we believe that our work provides a promising basis for future researchers interested in XAI-CF.
- [760] arXiv:2402.02460 (cross-list from cs.LG) [ pdf , ps , other ]
-
Title: Review of multimodal machine learning approaches in healthcareComments: 5 figures, 5 tablesSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
Abstract: Machine learning methods in healthcare have traditionally focused on using data from a single modality, limiting their ability to effectively replicate the clinical practice of integrating multiple sources of information for improved decision making. Clinicians typically rely on a variety of data sources including patients' demographic information, laboratory data, vital signs and various imaging data modalities to make informed decisions and contextualise their findings. Recent advances in machine learning have facilitated the more efficient incorporation of multimodal data, resulting in applications that better represent the clinician's approach. Here, we provide a review of multimodal machine learning approaches in healthcare, offering a comprehensive overview of recent literature. We discuss the various data modalities used in clinical diagnosis, with a particular emphasis on imaging data. We evaluate fusion techniques, explore existing multimodal datasets and examine common training strategies.
- [761] arXiv:2402.02464 (cross-list from cs.LG) [ pdf , ps , html , other ]
-
Title: A Graph is Worth $K$ Words: Euclideanizing Graph using Pure TransformerSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Social and Information Networks (cs.SI)
Abstract: Can we model non-Euclidean graphs as pure language or even Euclidean vectors while retaining their inherent information? The non-Euclidean property have posed a long term challenge in graph modeling. Despite recent GNN and Graphformer efforts encoding graphs as Euclidean vectors, recovering original graph from the vectors remains a challenge. We introduce GraphsGPT, featuring a Graph2Seq encoder that transforms non-Euclidean graphs into learnable graph words in a Euclidean space, along with a GraphGPT decoder that reconstructs the original graph from graph words to ensure information equivalence. We pretrain GraphsGPT on 100M molecules and yield some interesting findings: (1) Pretrained Graph2Seq excels in graph representation learning, achieving state-of-the-art results on 8/9 graph classification and regression tasks. (2) Pretrained GraphGPT serves as a strong graph generator, demonstrated by its ability to perform both unconditional and conditional graph generation. (3) Graph2Seq+GraphGPT enables effective graph mixup in the Euclidean space, overcoming previously known non-Euclidean challenge. (4) Our proposed novel edge-centric GPT pretraining task is effective in graph fields, underscoring its success in both representation and generation.
- [762] arXiv:2402.02478 (cross-list from cs.LG) [ pdf , ps , other ]
-
Title: Why are hyperbolic neural networks effective? A study on hierarchical representation capabilitySubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Abstract: Hyperbolic Neural Networks (HNNs), operating in hyperbolic space, have been widely applied in recent years, motivated by the existence of an optimal embedding in hyperbolic space that can preserve data hierarchical relationships (termed Hierarchical Representation Capability, HRC) more accurately than Euclidean space. However, there is no evidence to suggest that HNNs can achieve this theoretical optimal embedding, leading to much research being built on flawed motivations. In this paper, we propose a benchmark for evaluating HRC and conduct a comprehensive analysis of why HNNs are effective through large-scale experiments. Inspired by the analysis results, we propose several pre-training strategies to enhance HRC and improve the performance of downstream tasks, further validating the reliability of the analysis. Experiments show that HNNs cannot achieve the theoretical optimal embedding. The HRC is significantly affected by the optimization objectives and hierarchical structures, and enhancing HRC through pre-training strategies can significantly improve the performance of HNNs.
- [763] arXiv:2402.02479 (cross-list from cs.LG) [ pdf , ps , other ]
-
Title: BRAIn: Bayesian Reward-conditioned Amortized Inference for natural language generation from feedbackGaurav Pandey , Yatin Nandwani , Tahira Naseem , Mayank Mishra , Guangxuan Xu , Dinesh Raghu , Sachindra Joshi , Asim Munawar , Ramón Fernandez AstudilloComments: Under reviewSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Human-Computer Interaction (cs.HC)
Abstract: Following the success of Proximal Policy Optimization (PPO) for Reinforcement Learning from Human Feedback (RLHF), new techniques such as Sequence Likelihood Calibration (SLiC) and Direct Policy Optimization (DPO) have been proposed that are offline in nature and use rewards in an indirect manner. These techniques, in particular DPO, have recently become the tools of choice for LLM alignment due to their scalability and performance. However, they leave behind important features of the PPO approach. Methods such as SLiC or RRHF make use of the Reward Model (RM) only for ranking/preference, losing fine-grained information and ignoring the parametric form of the RM (eg., Bradley-Terry, Plackett-Luce), while methods such as DPO do not use even a separate reward model. In this work, we propose a novel approach, named BRAIn, that re-introduces the RM as part of a distribution matching approach.BRAIn considers the LLM distribution conditioned on the assumption of output goodness and applies Bayes theorem to derive an intractable posterior distribution where the RM is explicitly represented. BRAIn then distills this posterior into an amortized inference network through self-normalized importance sampling, leading to a scalable offline algorithm that significantly outperforms prior art in summarization and AntropicHH tasks. BRAIn also has interesting connections to PPO and DPO for specific RM choices.
- [764] arXiv:2402.02498 (cross-list from eess.IV) [ pdf , ps , html , other ]
-
Title: Fully Differentiable Correlation-driven 2D/3D Registration for X-ray to CT Image FusionComments: ISBI 2024Subjects: Image and Video Processing (eess.IV) ; Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
Abstract: Image-based rigid 2D/3D registration is a critical technique for fluoroscopic guided surgical interventions. In recent years, some learning-based fully differentiable methods have produced beneficial outcomes while the process of feature extraction and gradient flow transmission still lack controllability and interpretability. To alleviate these problems, in this work, we propose a novel fully differentiable correlation-driven network using a dual-branch CNN-transformer encoder which enables the network to extract and separate low-frequency global features from high-frequency local features. A correlation-driven loss is further proposed for low-frequency feature and high-frequency feature decomposition based on embedded information. Besides, a training strategy that learns to approximate a convex-shape similarity function is applied in our work. We test our approach on a in-house datasetand show that it outperforms both existing fully differentiable learning-based registration approaches and the conventional optimization-based baseline.
- [765] arXiv:2402.02500 (cross-list from cs.RO) [ pdf , ps , other ]
-
Title: Point Cloud Matters: Rethinking the Impact of Different Observation Spaces on Robot LearningSubjects: Robotics (cs.RO) ; Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Abstract: In this study, we explore the influence of different observation spaces on robot learning, focusing on three predominant modalities: RGB, RGB-D, and point cloud. Through extensive experimentation on over 17 varied contact-rich manipulation tasks, conducted across two benchmarks and simulators, we have observed a notable trend: point cloud-based methods, even those with the simplest designs, frequently surpass their RGB and RGB-D counterparts in performance. This remains consistent in both scenarios: training from scratch and utilizing pretraining. Furthermore, our findings indicate that point cloud observations lead to improved policy zero-shot generalization in relation to various geometry and visual clues, including camera viewpoints, lighting conditions, noise levels and background appearance. The outcomes suggest that 3D point cloud is a valuable observation modality for intricate robotic tasks. We will open-source all our codes and checkpoints, hoping that our insights can help design more generalizable and robust robotic models.
- [766] arXiv:2402.02513 (cross-list from cs.LG) [ pdf , ps , other ]
-
Title: Early stopping by correlating online indicators in neural networksComments: 26 pages, 6 figuresJournal-ref: Neural Networks, 159 (2023), pp 109-124. ISSN 1879-2782. ElsevierSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Neural and Evolutionary Computing (cs.NE)
Abstract: In order to minimize the generalization error in neural networks, a novel technique to identify overfitting phenomena when training the learner is formally introduced. This enables support of a reliable and trustworthy early stopping condition, thus improving the predictive power of that type of modeling. Our proposal exploits the correlation over time in a collection of online indicators, namely characteristic functions for indicating if a set of hypotheses are met, associated with a range of independent stopping conditions built from a canary judgment to evaluate the presence of overfitting. That way, we provide a formal basis for decision making in terms of interrupting the learning process.
As opposed to previous approaches focused on a single criterion, we take advantage of subsidiarities between independent assessments, thus seeking both a wider operating range and greater diagnostic reliability. With a view to illustrating the effectiveness of the halting condition described, we choose to work in the sphere of natural language processing, an operational continuum increasingly based on machine learning. As a case study, we focus on parser generation, one of the most demanding and complex tasks in the domain. The selection of cross-validation as a canary function enables an actual comparison with the most representative early stopping conditions based on overfitting identification, pointing to a promising start toward an optimal bias and variance control. - [767] arXiv:2402.02515 (cross-list from cs.CL) [ pdf , ps , other ]
-
Title: Modeling of learning curves with applications to pos taggingComments: 30 pages, 11 figuresJournal-ref: Computer Speech & Language, 41, pp 1-28 (2017). ISSN 0885-2308. ElsevierSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: An algorithm to estimate the evolution of learning curves on the whole of a training data base, based on the results obtained from a portion and using a functional strategy, is introduced. We approximate iteratively the sought value at the desired time, independently of the learning technique used and once a point in the process, called prediction level, has been passed. The proposal proves to be formally correct with respect to our working hypotheses and includes a reliable proximity condition. This allows the user to fix a convergence threshold with respect to the accuracy finally achievable, which extends the concept of stopping criterion and seems to be effective even in the presence of distorting observations.
Our aim is to evaluate the training effort, supporting decision making in order to reduce the need for both human and computational resources during the learning process. The proposal is of interest in at least three operational procedures. The first is the anticipation of accuracy gain, with the purpose of measuring how much work is needed to achieve a certain degree of performance. The second relates the comparison of efficiency between systems at training time, with the objective of completing this task only for the one that best suits our requirements. The prediction of accuracy is also a valuable item of information for customizing systems, since we can estimate in advance the impact of settings on both the performance and the development costs. Using the generation of part-of-speech taggers as an example application, the experimental results are consistent with our expectations. - [768] arXiv:2402.02516 (cross-list from cs.CL) [ pdf , ps , other ]
-
Title: Adaptive scheduling for adaptive sampling in POS taggers constructionComments: 23 pager, 10 figuresJournal-ref: Computer Speech & Language, 60, 101020 (2020), pp 1-18. ISSN 0885-2308. ElsevierSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: We introduce an adaptive scheduling for adaptive sampling as a novel way of machine learning in the construction of part-of-speech taggers. The goal is to speed up the training on large data sets, without significant loss of performance with regard to an optimal configuration. In contrast to previous methods using a random, fixed or regularly rising spacing between the instances, ours analyzes the shape of the learning curve geometrically in conjunction with a functional model to increase or decrease it at any time. The algorithm proves to be formally correct regarding our working hypotheses. Namely, given a case, the following one is the nearest ensuring a net gain of learning ability from the former, it being possible to modulate the level of requirement for this condition. We also improve the robustness of sampling by paying greater attention to those regions of the training data base subject to a temporary inflation in performance, thus preventing the learning from stopping prematurely.
The proposal has been evaluated on the basis of its reliability to identify the convergence of models, corroborating our expectations. While a concrete halting condition is used for testing, users can choose any condition whatsoever to suit their own specific needs. - [769] arXiv:2402.02519 (cross-list from cs.RO) [ pdf , ps , other ]
-
Title: SIMPL: A Simple and Efficient Multi-agent Motion Prediction Baseline for Autonomous DrivingComments: Code is available at this https URLSubjects: Robotics (cs.RO) ; Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Abstract: This paper presents a Simple and effIcient Motion Prediction baseLine (SIMPL) for autonomous vehicles. Unlike conventional agent-centric methods with high accuracy but repetitive computations and scene-centric methods with compromised accuracy and generalizability, SIMPL delivers real-time, accurate motion predictions for all relevant traffic participants. To achieve improvements in both accuracy and inference speed, we propose a compact and efficient global feature fusion module that performs directed message passing in a symmetric manner, enabling the network to forecast future motion for all road users in a single feed-forward pass and mitigating accuracy loss caused by viewpoint shifting. Additionally, we investigate the continuous trajectory parameterization using Bernstein basis polynomials in trajectory decoding, allowing evaluations of states and their higher-order derivatives at any desired time point, which is valuable for downstream planning tasks. As a strong baseline, SIMPL exhibits highly competitive performance on Argoverse 1 & 2 motion forecasting benchmarks compared with other state-of-the-art methods. Furthermore, its lightweight design and low inference latency make SIMPL highly extensible and promising for real-world onboard deployment. We open-source the code at this https URL .
- [770] arXiv:2402.02522 (cross-list from cs.CL) [ pdf , ps , html , other ]
-
Title: Absolute convergence and error thresholds in non-active adaptive samplingComments: 27 pages, 10 figuresJournal-ref: Journal of Computer and System Sciences, 129 (2020) , pp 39-61. ISSN 1090-2724. ElsevierSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: Non-active adaptive sampling is a way of building machine learning models from a training data base which are supposed to dynamically and automatically derive guaranteed sample size. In this context and regardless of the strategy used in both scheduling and generating of weak predictors, a proposal for calculating absolute convergence and error thresholds is described. We not only make it possible to establish when the quality of the model no longer increases, but also supplies a proximity condition to estimate in absolute terms how close it is to achieving such a goal, thus supporting decision making for fine-tuning learning parameters in model selection. The technique proves its correctness and completeness with respect to our working hypotheses, in addition to strengthening the robustness of the sampling scheme. Tests meet our expectations and illustrate the proposal in the domain of natural language processing, taking the generation of part-of-speech taggers as case study.
- [771] arXiv:2402.02541 (cross-list from cs.CL) [ pdf , ps , other ]
-
Title: Knowledge Generation for Zero-shot Knowledge-based VQAComments: accepted as Findings in EACL 2023Subjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
Abstract: Previous solutions to knowledge-based visual question answering~(K-VQA) retrieve knowledge from external knowledge bases and use supervised learning to train the K-VQA model. Recently pre-trained LLMs have been used as both a knowledge source and a zero-shot QA model for K-VQA and demonstrated promising results. However, these recent methods do not explicitly show the knowledge needed to answer the questions and thus lack interpretability. Inspired by recent work on knowledge generation from LLMs for text-based QA, in this work we propose and test a similar knowledge-generation-based K-VQA method, which first generates knowledge from an LLM and then incorporates the generated knowledge for K-VQA in a zero-shot manner. We evaluate our method on two K-VQA benchmarks and found that our method performs better than previous zero-shot K-VQA methods and our generated knowledge is generally relevant and helpful.
- [772] arXiv:2402.02544 (cross-list from cs.CV) [ pdf , ps , html , other ]
-
Title: LHRS-Bot: Empowering Remote Sensing with VGI-Enhanced Large Multimodal Language ModelComments: 36 pages, 10 figures. Github this https URLSubjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: The revolutionary capabilities of large language models (LLMs) have paved the way for multimodal large language models (MLLMs) and fostered diverse applications across various specialized domains. In the remote sensing (RS) field, however, the diverse geographical landscapes and varied objects in RS imagery are not adequately considered in recent MLLM endeavors. To bridge this gap, we construct a large-scale RS image-text dataset, LHRS-Align, and an informative RS-specific instruction dataset, LHRS-Instruct, leveraging the extensive volunteered geographic information (VGI) and globally available RS images. Building on this foundation, we introduce LHRS-Bot, an MLLM tailored for RS image understanding through a novel multi-level vision-language alignment strategy and a curriculum learning method. Additionally, we introduce LHRS-Bench, a benchmark for thoroughly evaluating MLLMs' abilities in RS image understanding. Comprehensive experiments demonstrate that LHRS-Bot exhibits a profound understanding of RS images and the ability to perform nuanced reasoning within the RS domain.
- [773] arXiv:2402.02548 (cross-list from cs.CL) [ pdf , ps , other ]
-
Title: "What's my model inside of?": Exploring the role of environments for grounded natural language understandingComments: PhD ThesisSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI); Social and Information Networks (cs.SI)
Abstract: In contrast to classical cognitive science which studied brains in isolation, ecological approaches focused on the role of the body and environment in shaping cognition. Similarly, in this thesis we adopt an ecological approach to grounded natural language understanding (NLU) research. Grounded language understanding studies language understanding systems situated in the context of events, actions and precepts in naturalistic/simulated virtual environments. Where classic research tends to focus on designing new models and optimization methods while treating environments as given, we explore the potential of environment design for improving data collection and model development. We developed novel training and annotation approaches for procedural text understanding based on text-based game environments. We also drew upon embodied cognitive linguistics literature to propose a roadmap for grounded NLP research, and to inform the development of a new benchmark for measuring the progress of large language models on challenging commonsense reasoning tasks. We leveraged the richer supervision provided by text-based game environments to develop Breakpoint Transformers, a novel approach to modeling intermediate semantic information in long narrative or procedural texts. Finally, we integrated theories on the role of environments in collective human intelligence to propose a design for AI-augmented "social thinking environments" for knowledge workers like scientists.
- [774] arXiv:2402.02549 (cross-list from cs.CL) [ pdf , ps , other ]
-
Title: Are Large Language Models Table-based Fact-Checkers?Comments: CSCWD 2024Subjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: Table-based Fact Verification (TFV) aims to extract the entailment relation between statements and structured tables. Existing TFV methods based on small-scaled models suffer from insufficient labeled data and weak zero-shot ability. Recently, the appearance of Large Language Models (LLMs) has gained lots of attraction in research fields. They have shown powerful zero-shot and in-context learning abilities on several NLP tasks, but their potential on TFV is still unknown. In this work, we implement a preliminary study about whether LLMs are table-based fact-checkers. In detail, we design diverse prompts to explore how the in-context learning can help LLMs in TFV, i.e., zero-shot and few-shot TFV capability. Besides, we carefully design and construct TFV instructions to study the performance gain brought by the instruction tuning of LLMs. Experimental results demonstrate that LLMs can achieve acceptable results on zero-shot and few-shot TFV with prompt engineering, while instruction-tuning can stimulate the TFV capability significantly. We also make some valuable findings about the format of zero-shot prompts and the number of in-context examples. Finally, we analyze some possible directions to promote the accuracy of TFV via LLMs, which is beneficial to further research of table reasoning.
- [775] arXiv:2402.02552 (cross-list from math.OC) [ pdf , ps , other ]
-
Title: Neur2BiLO: Neural Bilevel OptimizationSubjects: Optimization and Control (math.OC) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: Bilevel optimization deals with nested problems in which a leader takes the first decision to minimize their objective function while accounting for a follower's best-response reaction. Constrained bilevel problems with integer variables are particularly notorious for their hardness. While exact solvers have been proposed for mixed-integer linear bilevel optimization, they tend to scale poorly with problem size and are hard to generalize to the non-linear case. On the other hand, problem-specific algorithms (exact and heuristic) are limited in scope. Under a data-driven setting in which similar instances of a bilevel problem are solved routinely, our proposed framework, Neur2BiLO, embeds a neural network approximation of the leader's or follower's value function, trained via supervised regression, into an easy-to-solve mixed-integer program. Neur2BiLO serves as a heuristic that produces high-quality solutions extremely fast for the bilevel knapsack interdiction problem, the "critical node game" from network security, a donor-recipient healthcare problem, and discrete network design from transportation planning. These problems are diverse in that they have linear or non-linear objectives/constraints and integer or mixed-integer variables, making Neur2BiLO unique in its versatility.
- [776] arXiv:2402.02563 (cross-list from cs.CL) [ pdf , ps , other ]
-
Title: DefInt: A Default-interventionist Framework for Efficient Reasoning with Hybrid Large Language ModelsComments: 18 pages, 10 figures, 14 tablesSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: Large language models (LLMs) have shown impressive emergent abilities in a wide range of tasks, but still face challenges in handling complex reasoning problems. Previous works like chain-of-thought (CoT) and tree-of-thoughts(ToT) have predominately focused on enhancing accuracy, but overlook the rapidly increasing token cost, which could be particularly problematic for open-ended real-world tasks with huge solution spaces. Motivated by the dual process theory of human cognition, we propose a Default-Interventionist framework (DefInt) to unleash the synergistic potential of hybrid LLMs. By default, DefInt uses smaller-scale language models to generate low-cost reasoning thoughts, which resembles the fast intuitions produced by System 1. If the intuitions are considered with low confidence, DefInt will invoke the reflective reasoning of scaled-up language models as the intervention of System 2, which can override the default thoughts and rectify the reasoning process. Experiments on five representative reasoning tasks show that DefInt consistently achieves state-of-the-art reasoning accuracy and solution diversity. More importantly, it substantially reduces the token cost by 49%-79% compared to the second accurate baselines. Specifically, the open-ended tasks have an average 75% token cost reduction. Code repo with all prompts will be released upon publication.
- [777] arXiv:2402.02564 (cross-list from cs.CL) [ pdf , ps , html , other ]
-
Title: A Truly Joint Neural Architecture for Segmentation and ParsingSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: Contemporary multilingual dependency parsers can parse a diverse set of languages, but for Morphologically Rich Languages (MRLs), performance is attested to be lower than other languages. The key challenge is that, due to high morphological complexity and ambiguity of the space-delimited input tokens, the linguistic units that act as nodes in the tree are not known in advance. Pre-neural dependency parsers for MRLs subscribed to the joint morpho-syntactic hypothesis, stating that morphological segmentation and syntactic parsing should be solved jointly, rather than as a pipeline where segmentation precedes parsing. However, neural state-of-the-art parsers to date use a strict pipeline. In this paper we introduce a joint neural architecture where a lattice-based representation preserving all morphological ambiguity of the input is provided to an arc-factored model, which then solves the morphological segmentation and syntactic parsing tasks at once. Our experiments on Hebrew, a rich and highly ambiguous MRL, demonstrate state-of-the-art performance on parsing, tagging and segmentation of the Hebrew section of UD, using a single model. This proposed architecture is LLM-based and language agnostic, providing a solid foundation for MRLs to obtain further performance improvements and bridge the gap with other languages.
- [778] arXiv:2402.02566 (cross-list from cs.RO) [ pdf , ps , other ]
-
Title: STAGE: Scalable and Traversability-Aware Graph based Exploration Planner for Dynamically Varying EnvironmentsSubjects: Robotics (cs.RO) ; Artificial Intelligence (cs.AI)
Abstract: In this article, we propose a novel navigation framework that leverages a two layered graph representation of the environment for efficient large-scale exploration, while it integrates a novel uncertainty awareness scheme to handle dynamic scene changes in previously explored areas. The framework is structured around a novel goal oriented graph representation, that consists of, i) the local sub-graph and ii) the global graph layer respectively. The local sub-graphs encode local volumetric gain locations as frontiers, based on the direct pointcloud visibility, allowing fast graph building and path planning. Additionally, the global graph is build in an efficient way, using node-edge information exchange only on overlapping regions of sequential sub-graphs. Different from the state-of-the-art graph based exploration methods, the proposed approach efficiently re-uses sub-graphs built in previous iterations to construct the global navigation layer. Another merit of the proposed scheme is the ability to handle scene changes (e.g. blocked pathways), adaptively updating the obstructed part of the global graph from traversable to not-traversable. This operation involved oriented sample space of a path segment in the global graph layer, while removing the respective edges from connected nodes of the global graph in cases of obstructions. As such, the exploration behavior is directing the robot to follow another route in the global re-positioning phase through path-way updates in the global graph. Finally, we showcase the performance of the method both in simulation runs as well as deployed in real-world scene involving a legged robot carrying camera and lidar sensor.
- [779] arXiv:2402.02592 (cross-list from cs.LG) [ pdf , ps , other ]
-
Title: Unified Training of Universal Time Series Forecasting TransformersSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Abstract: Deep learning for time series forecasting has traditionally operated within a one-model-per-dataset framework, limiting its potential to leverage the game-changing impact of large pre-trained models. The concept of universal forecasting, emerging from pre-training on a vast collection of time series datasets, envisions a single Large Time Series Model capable of addressing diverse downstream forecasting tasks. However, constructing such a model poses unique challenges specific to time series data: i) cross-frequency learning, ii) accommodating an arbitrary number of variates for multivariate time series, and iii) addressing the varying distributional properties inherent in large-scale data. To address these challenges, we present novel enhancements to the conventional time series Transformer architecture, resulting in our proposed Masked Encoder-based Universal Time Series Forecasting Transformer (Moirai). Trained on our newly introduced Large-scale Open Time Series Archive (LOTSA) featuring over 27B observations across nine domains, Moirai achieves competitive or superior performance as a zero-shot forecaster when compared to full-shot models. Code, model weights, and data will be released.
- [780] arXiv:2402.02600 (cross-list from cs.CR) [ pdf , ps , other ]
-
Title: Evading Deep Learning-Based Malware Detectors via Obfuscation: A Deep Reinforcement Learning ApproachSubjects: Cryptography and Security (cs.CR) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: Adversarial Malware Generation (AMG), the generation of adversarial malware variants to strengthen Deep Learning (DL)-based malware detectors has emerged as a crucial tool in the development of proactive cyberdefense. However, the majority of extant works offer subtle perturbations or additions to executable files and do not explore full-file obfuscation. In this study, we show that an open-source encryption tool coupled with a Reinforcement Learning (RL) framework can successfully obfuscate malware to evade state-of-the-art malware detection engines and outperform techniques that use advanced modification methods. Our results show that the proposed method improves the evasion rate from 27%-49% compared to widely-used state-of-the-art reinforcement learning-based methods.
- [781] arXiv:2402.02623 (cross-list from cs.CE) [ pdf , ps , other ]
-
Title: Efficient Market Dynamics: Unraveling Informational Efficiency in UK Horse Racing Betting Markets Through Betfair's Time Series AnalysisSubjects: Computational Engineering, Finance, and Science (cs.CE) ; Artificial Intelligence (cs.AI)
Abstract: Using Betfair's time series data, an analysis of the United Kingdom (UK) horse racing market reveals an interesting paradox: a market with short tails, rapidly decaying autocorrelations, and no long-term memory. There seems to be a remarkably high level of informational efficiency in betting exchange returns, in contrast to financial assets that are characterized by heavy tails and volatility clustering. The generalized Gaussian unconditional distribution with a light tail point to a market where knowledge is quickly assimilated and reflected in prices. This is further supported by the extremely quick fading of autocorrelations and the absence of gain-loss asymmetry. Therefore, in addition to measuring long-range memory, the Hurst exponent also shows mean reversion, a sign that markets respond quickly to fresh information.
- [782] arXiv:2402.02624 (cross-list from cs.RO) [ pdf , ps , html , other ]
-
Title: A Safe Reinforcement Learning driven Weights-varying Model Predictive Control for Autonomous Vehicle Motion ControlSubjects: Robotics (cs.RO) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Systems and Control (eess.SY)
Abstract: Determining the optimal cost function parameters of Model Predictive Control (MPC) to optimize multiple control objectives is a challenging and time-consuming task. Multiobjective Bayesian Optimization (BO) techniques solve this problem by determining a Pareto optimal parameter set for an MPC with static weights. However, a single parameter set may not deliver the most optimal closed-loop control performance when the context of the MPC operating conditions changes during its operation, urging the need to adapt the cost function weights at runtime. Deep Reinforcement Learning (RL) algorithms can automatically learn context-dependent optimal parameter sets and dynamically adapt for a Weightsvarying MPC (WMPC). However, learning cost function weights from scratch in a continuous action space may lead to unsafe operating states. To solve this, we propose a novel approach limiting the RL actions within a safe learning space representing a catalog of pre-optimized BO Pareto-optimal weight sets. We conceive a RL agent not to learn in a continuous space but to proactively anticipate upcoming control tasks and to choose the most optimal discrete actions, each corresponding to a single set of Pareto optimal weights, context-dependent. Hence, even an untrained RL agent guarantees a safe and optimal performance. Experimental results demonstrate that an untrained RL-WMPC shows Pareto-optimal closed-loop behavior and training the RL-WMPC helps exhibit a performance beyond the Pareto-front.
- [783] arXiv:2402.02625 (cross-list from cs.LG) [ pdf , ps , other ]
-
Title: Enhancing Transformer RNNs with Multiple Temporal PerspectivesComments: 11 pages, 8 figures, 4 tables, in review for ICML 2024Subjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
Abstract: We introduce the concept of multiple temporal perspectives, a novel approach applicable to Recurrent Neural Network (RNN) architectures for enhancing their understanding of sequential data. This method involves maintaining diverse temporal views of previously encountered text, significantly enriching the language models' capacity to interpret context. To show the efficacy of this approach, we incorporate it into the Receptance Weighted Key Value (RWKV) architecture, addressing its inherent challenge of retaining all historical information within a single hidden state. Notably, this improvement is achieved with a minimal increase in the number of parameters --even as little as $0.04\%$ of the original number of parameters. Further, the additional parameters necessary for the multiple temporal perspectives are fine-tuned with minimal computational overhead, avoiding the need for a full pre-training. The resulting model maintains linear computational complexity during prompt inference, ensuring consistent efficiency across various sequence lengths. The empirical results and ablation studies included in our research validate the effectiveness of our approach, showcasing improved performance across multiple benchmarks. The code, model weights and datasets are open-sourced at: this https URL .
- [784] arXiv:2402.02636 (cross-list from cs.CL) [ pdf , ps , other ]
-
Title: Can Large Language Models Learn Independent Causal Mechanisms?Comments: 17 pages, 8 pages for the main paper and 9 pages for references and appendices, 12 figuresSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI); Information Theory (cs.IT); Machine Learning (cs.LG)
Abstract: Despite impressive performance on language modelling and complex reasoning tasks, Large Language Models (LLMs) fall short on the same tasks in uncommon settings or with distribution shifts, exhibiting some lack of generalisation ability. This issue has usually been alleviated by feeding more training data into the LLM. However, this method is brittle, as the scope of tasks may not be readily predictable or may evolve, and updating the model with new data generally requires extensive additional training. By contrast, systems, such as causal models, that learn abstract variables and causal relationships can demonstrate increased robustness against changes in the distribution. One reason for this success is the existence and use of Independent Causal Mechanisms (ICMs) representing high-level concepts that only sparsely interact. In this work, we apply two concepts from causality to learn ICMs within LLMs. We develop a new LLM architecture composed of multiple sparsely interacting language modelling modules. We introduce a routing scheme to induce specialisation of the network into domain-specific modules. We also present a Mutual Information minimisation objective that trains a separate module to learn abstraction and domain-invariant mechanisms. We show that such causal constraints can improve out-of-distribution performance on abstract and causal reasoning tasks.
- [785] arXiv:2402.02643 (cross-list from cs.DB) [ pdf , ps , other ]
-
Title: LLM-Enhanced Data ManagementSubjects: Databases (cs.DB) ; Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG)
Abstract: Machine learning (ML) techniques for optimizing data management problems have been extensively studied and widely deployed in recent five years. However traditional ML methods have limitations on generalizability (adapting to different scenarios) and inference ability (understanding the context). Fortunately, large language models (LLMs) have shown high generalizability and human-competitive abilities in understanding context, which are promising for data management tasks (e.g., database diagnosis, database tuning). However, existing LLMs have several limitations: hallucination, high cost, and low accuracy for complicated tasks. To address these challenges, we design LLMDB, an LLM-enhanced data management paradigm which has generalizability and high inference ability while avoiding hallucination, reducing LLM cost, and achieving high accuracy. LLMDB embeds domain-specific knowledge to avoid hallucination by LLM fine-tuning and prompt engineering. LLMDB reduces the high cost of LLMs by vector databases which provide semantic search and caching abilities. LLMDB improves the task accuracy by LLM agent which provides multiple-round inference and pipeline executions. We showcase three real-world scenarios that LLMDB can well support, including query rewrite, database diagnosis and data analytics. We also summarize the open research challenges of LLMDB.
- [786] arXiv:2402.02648 (cross-list from cs.CL) [ pdf , ps , html , other ]
-
Title: Recursive Chain-of-Feedback Prevents Performance Degradation from Redundant PromptingComments: Still Ongoing Work; 8 Pages; 2 FiguresSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI)
Abstract: Large Language Models (LLMs) frequently struggle with complex reasoning tasks, failing to construct logically sound steps towards the solution. In response to this behavior, users often try prompting the LLMs repeatedly in hopes of reaching a better response. This paper studies such repetitive behavior and its effect by defining a novel setting, Chain-of-Feedback (CoF). The setting takes questions that require multi-step reasoning as an input. Upon response, we repetitively prompt meaningless feedback (e.g. 'make another attempt') requesting additional trials. Surprisingly, our preliminary results show that repeated meaningless feedback gradually decreases the quality of the responses, eventually leading to a larger deviation from the intended outcome. To alleviate these troubles, we propose a novel method, Recursive Chain-of-Feedback (R-CoF). Following the logic of recursion in computer science, R-CoF recursively revises the initially incorrect response by breaking down each incorrect reasoning step into smaller individual problems. Our preliminary results show that majority of questions that LLMs fail to respond correctly can be answered using R-CoF without any sample data outlining the logical process.
- [787] arXiv:2402.02651 (cross-list from cs.LG) [ pdf , ps , other ]
-
Title: Vision-Language Models Provide Promptable Representations for Reinforcement LearningSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
Abstract: Humans can quickly learn new behaviors by leveraging background world knowledge. In contrast, agents trained with reinforcement learning (RL) typically learn behaviors from scratch. We thus propose a novel approach that uses the vast amounts of general and indexable world knowledge encoded in vision-language models (VLMs) pre-trained on Internet-scale data for embodied RL. We initialize policies with VLMs by using them as promptable representations: embeddings that are grounded in visual observations and encode semantic features based on the VLM's internal knowledge, as elicited through prompts that provide task context and auxiliary information. We evaluate our approach on visually-complex, long horizon RL tasks in Minecraft and robot navigation in Habitat. We find that our policies trained on embeddings extracted from general-purpose VLMs outperform equivalent policies trained on generic, non-promptable image embeddings. We also find our approach outperforms instruction-following methods and performs comparably to domain-specific embeddings.
- [788] arXiv:2402.02675 (cross-list from cs.LG) [ pdf , ps , other ]
-
Title: Verifiable evaluations of machine learning models using zkSNARKsTobin South , Alexander Camuto , Shrey Jain , Shayla Nguyen , Robert Mahari , Christian Paquin , Jason Morton , Alex 'Sandy' PentlandSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Cryptography and Security (cs.CR)
Abstract: In a world of increasing closed-source commercial machine learning models, model evaluations from developers must be taken at face value. These benchmark results, whether over task accuracy, bias evaluations, or safety checks, are traditionally impossible to verify by a model end-user without the costly or impossible process of re-performing the benchmark on black-box model outputs. This work presents a method of verifiable model evaluation using model inference through zkSNARKs. The resulting zero-knowledge computational proofs of model outputs over datasets can be packaged into verifiable evaluation attestations showing that models with fixed private weights achieve stated performance or fairness metrics over public inputs. These verifiable attestations can be performed on any standard neural network model with varying compute requirements. For the first time, we demonstrate this across a sample of real-world models and highlight key challenges and design solutions. This presents a new transparency paradigm in the verifiable evaluation of private models.
- [789] arXiv:2402.02680 (cross-list from cs.CL) [ pdf , ps , other ]
-
Title: Large Language Models are Geographically BiasedSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI); Computers and Society (cs.CY); Machine Learning (cs.LG)
Abstract: Large Language Models (LLMs) inherently carry the biases contained in their training corpora, which can lead to the perpetuation of societal harm. As the impact of these foundation models grows, understanding and evaluating their biases becomes crucial to achieving fairness and accuracy. We propose to study what LLMs know about the world we live in through the lens of geography. This approach is particularly powerful as there is ground truth for the numerous aspects of human life that are meaningfully projected onto geographic space such as culture, race, language, politics, and religion. We show various problematic geographic biases, which we define as systemic errors in geospatial predictions. Initially, we demonstrate that LLMs are capable of making accurate zero-shot geospatial predictions in the form of ratings that show strong monotonic correlation with ground truth (Spearman's $\rho$ of up to 0.89). We then show that LLMs exhibit common biases across a range of objective and subjective topics. In particular, LLMs are clearly biased against locations with lower socioeconomic conditions (e.g. most of Africa) on a variety of sensitive subjective topics such as attractiveness, morality, and intelligence (Spearman's $\rho$ of up to 0.70). Finally, we introduce a bias score to quantify this and find that there is significant variation in the magnitude of bias across existing LLMs.
- [790] arXiv:2402.02681 (cross-list from cs.LG) [ pdf , ps , other ]
-
Title: Equivariant Symmetry Breaking SetsComments: 28 pages, 30 figures Submitted to ICML 2024Subjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Abstract: Equivariant neural networks (ENNs) have been shown to be extremely effective in applications involving underlying symmetries. By construction ENNs cannot produce lower symmetry outputs given a higher symmetry input. However, spontaneous symmetry breaking occurs in many physical systems and we may obtain a less symmetric stable state from an initial highly symmetric one. Hence, it is imperative that we understand how to systematically break symmetry in ENNs. In this work, we propose a novel symmetry breaking framework that is fully equivariant. We emphasize that our approach is general and applicable to equivariance under any group. To achieve this, we introduce the idea of symmetry breaking sets (SBS). Rather than redesign existing networks, we design sets of symmetry breaking objects which we feed into our network based on the symmetry of our inputs and outputs. We show there is a natural way to define equivariance on these sets, which gives an additional constraint. Minimizing the size of these sets equates to data efficiency. We prove that minimizing these sets translates to a well studied group theory problem, and tabulate solutions to this problem for the point groups. Finally, we provide some examples of symmetry breaking to demonstrate how our approach works in practice.
- [791] arXiv:2402.02687 (cross-list from cs.LG) [ pdf , ps , other ]
-
Title: Poisson Process for Bayesian OptimizationXiaoxing Wang , Jiaxing Li , Chao Xue , Wei Liu , Weifeng Liu , Xiaokang Yang , Junchi Yan , Dacheng TaoSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
Abstract: BayesianOptimization(BO) is a sample-efficient black-box optimizer, and extensive methods have been proposed to build the absolute function response of the black-box function through a probabilistic surrogate model, including Tree-structured Parzen Estimator (TPE), random forest (SMAC), and Gaussian process (GP). However, few methods have been explored to estimate the relative rankings of candidates, which can be more robust to noise and have better practicality than absolute function responses, especially when the function responses are intractable but preferences can be acquired. To this end, we propose a novel ranking-based surrogate model based on the Poisson process and introduce an efficient BO framework, namely Poisson Process Bayesian Optimization (PoPBO). Two tailored acquisition functions are further derived from classic LCB and EI to accommodate it. Compared to the classic GP-BO method, our PoPBO has lower computation costs and better robustness to noise, which is verified by abundant experiments. The results on both simulated and real-world benchmarks, including hyperparameter optimization (HPO) and neural architecture search (NAS), show the effectiveness of PoPBO.
- [792] arXiv:2402.02695 (cross-list from cs.CL) [ pdf , ps , html , other ]
-
Title: Exploiting Class Probabilities for Black-box Sentence-level AttacksComments: EACL 2024 FindingsSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI); Cryptography and Security (cs.CR); Machine Learning (cs.LG)
Abstract: Sentence-level attacks craft adversarial sentences that are synonymous with correctly-classified sentences but are misclassified by the text classifiers. Under the black-box setting, classifiers are only accessible through their feedback to queried inputs, which is predominately available in the form of class probabilities. Even though utilizing class probabilities results in stronger attacks, due to the challenges of using them for sentence-level attacks, existing attacks use either no feedback or only the class labels. Overcoming the challenges, we develop a novel algorithm that uses class probabilities for black-box sentence-level attacks, investigate the effectiveness of using class probabilities on the attack's success, and examine the question if it is worthy or practical to use class probabilities by black-box sentence-level attacks. We conduct extensive evaluations of our attack comparing with the baselines across various classifiers and benchmark datasets.
- [793] arXiv:2402.02696 (cross-list from cs.LG) [ pdf , ps , other ]
-
Title: Causal Feature Selection for Responsible Machine LearningSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Abstract: Machine Learning (ML) has become an integral aspect of many real-world applications. As a result, the need for responsible machine learning has emerged, focusing on aligning ML models to ethical and social values, while enhancing their reliability and trustworthiness. Responsible ML involves many issues. This survey addresses four main issues: interpretability, fairness, adversarial robustness, and domain generalization. Feature selection plays a pivotal role in the responsible ML tasks. However, building upon statistical correlations between variables can lead to spurious patterns with biases and compromised performance. This survey focuses on the current study of causal feature selection: what it is and how it can reinforce the four aspects of responsible ML. By identifying features with causal impacts on outcomes and distinguishing causality from correlation, causal feature selection is posited as a unique approach to ensuring ML models to be ethically and socially responsible in high-stakes applications.
- [794] arXiv:2402.02698 (cross-list from cs.LG) [ pdf , ps , other ]
-
Title: Beyond Expectations: Learning with Stochastic Dominance Made PracticalSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Optimization and Control (math.OC)
Abstract: Stochastic dominance models risk-averse preferences for decision making with uncertain outcomes, which naturally captures the intrinsic structure of the underlying uncertainty, in contrast to simply resorting to the expectations. Despite theoretically appealing, the application of stochastic dominance in machine learning has been scarce, due to the following challenges: $\textbf{i)}$, the original concept of stochastic dominance only provides a $\textit{partial order}$, therefore, is not amenable to serve as an optimality criterion; and $\textbf{ii)}$, an efficient computational recipe remains lacking due to the continuum nature of evaluating stochastic dominance.%, which barriers its application for machine learning.
In this work, we make the first attempt towards establishing a general framework of learning with stochastic dominance. We first generalize the stochastic dominance concept to enable feasible comparisons between any arbitrary pair of random variables. We next develop a simple and computationally efficient approach for finding the optimal solution in terms of stochastic dominance, which can be seamlessly plugged into many learning tasks. Numerical experiments demonstrate that the proposed method achieves comparable performance as standard risk-neutral strategies and obtains better trade-offs against risk across a variety of applications including supervised learning, reinforcement learning, and portfolio optimization. - [795] arXiv:2402.02701 (cross-list from cs.LG) [ pdf , ps , other ]
-
Title: Understanding What Affects Generalization Gap in Visual Reinforcement Learning: Theory and Empirical EvidenceComments: Part of this work is accepted as AAMAS 2024 extended abstractSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
Abstract: Recently, there are many efforts attempting to learn useful policies for continuous control in visual reinforcement learning (RL). In this scenario, it is important to learn a generalizable policy, as the testing environment may differ from the training environment, e.g., there exist distractors during deployment. Many practical algorithms are proposed to handle this problem. However, to the best of our knowledge, none of them provide a theoretical understanding of what affects the generalization gap and why their proposed methods work. In this paper, we bridge this issue by theoretically answering the key factors that contribute to the generalization gap when the testing environment has distractors. Our theories indicate that minimizing the representation distance between training and testing environments, which aligns with human intuition, is the most critical for the benefit of reducing the generalization gap. Our theoretical results are supported by the empirical evidence in the DMControl Generalization Benchmark (DMC-GB).
- [796] arXiv:2402.02705 (cross-list from cs.LG) [ pdf , ps , other ]
-
Title: Representation Surgery for Multi-Task Model MergingJournal-ref: Forty-first International Conference on Machine Learning (ICML 2024)Subjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
Abstract: Multi-task learning (MTL) compresses the information from multiple tasks into a unified backbone to improve computational efficiency and generalization. Recent work directly merges multiple independently trained models to perform MTL instead of collecting their raw data for joint training, greatly expanding the application scenarios of MTL. However, by visualizing the representation distribution of existing model merging schemes, we find that the merged model often suffers from the dilemma of representation bias. That is, there is a significant discrepancy in the representation distribution between the merged and individual models, resulting in poor performance of merged MTL. In this paper, we propose a representation surgery solution called "Surgery" to reduce representation bias in the merged model. Specifically, Surgery is a lightweight task-specific module that takes the representation of the merged model as input and attempts to output the biases contained in the representation from the merged model. We then designed an unsupervised optimization objective that updates the Surgery module by minimizing the distance between the merged model's representation and the individual model's representation. Extensive experiments demonstrate significant MTL performance improvements when our Surgery module is applied to state-of-the-art (SOTA) model merging schemes.
- [797] arXiv:2402.02713 (cross-list from cs.LG) [ pdf , ps , html , other ]
-
Title: Position Paper: What Can Large Language Models Tell Us about Time Series AnalysisMing Jin , Yifan Zhang , Wei Chen , Kexin Zhang , Yuxuan Liang , Bin Yang , Jindong Wang , Shirui Pan , Qingsong WenComments: 16 pages, 8 figures, 1 tableSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Abstract: Time series analysis is essential for comprehending the complexities inherent in various real-world systems and applications. Although large language models (LLMs) have recently made significant strides, the development of artificial general intelligence (AGI) equipped with time series analysis capabilities remains in its nascent phase. Most existing time series models heavily rely on domain knowledge and extensive model tuning, predominantly focusing on prediction tasks. In this paper, we argue that current LLMs have the potential to revolutionize time series analysis, thereby promoting efficient decision-making and advancing towards a more universal form of time series analytical intelligence. Such advancement could unlock a wide range of possibilities, including modality switching and time series question answering. We encourage researchers and practitioners to recognize the potential of LLMs in advancing time series analysis and emphasize the need for trust in these related efforts. Furthermore, we detail the seamless integration of time series analysis with existing LLM technologies and outline promising avenues for future research.
- [798] arXiv:2402.02718 (cross-list from cs.IR) [ pdf , ps , other ]
-
Title: Denoising Time Cycle Modeling for RecommendationSubjects: Information Retrieval (cs.IR) ; Artificial Intelligence (cs.AI)
Abstract: Recently, modeling temporal patterns of user-item interactions have attracted much attention in recommender systems. We argue that existing methods ignore the variety of temporal patterns of user behaviors. We define the subset of user behaviors that are irrelevant to the target item as noises, which limits the performance of target-related time cycle modeling and affect the recommendation performance. In this paper, we propose Denoising Time Cycle Modeling (DiCycle), a novel approach to denoise user behaviors and select the subset of user behaviors that are highly related to the target item. DiCycle is able to explicitly model diverse time cycle patterns for recommendation. Extensive experiments are conducted on both public benchmarks and a real-world dataset, demonstrating the superior performance of DiCycle over the state-of-the-art recommendation methods.
- [799] arXiv:2402.02732 (cross-list from cs.LG) [ pdf , ps , other ]
-
Title: A Generative Approach to Surrogate-based Black-box AttacksSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Cryptography and Security (cs.CR)
Abstract: Surrogate-based black-box attacks have exposed the heightened vulnerability of DNNs. These attacks are designed to craft adversarial examples for any samples with black-box target feedback for only a given set of samples. State-of-the-art surrogate-based attacks involve training a discriminative surrogate that mimics the target's outputs. The goal is to learn the decision boundaries of the target. The surrogate is then attacked by white-box attacks to craft adversarial examples similar to the original samples but belong to other classes. With limited samples, the discriminative surrogate fails to accurately learn the target's decision boundaries, and these surrogate-based attacks suffer from low success rates. Different from the discriminative approach, we propose a generative surrogate that learns the distribution of samples residing on or close to the target's decision boundaries. The distribution learned by the generative surrogate can be used to craft adversarial examples that have imperceptible differences from the original samples but belong to other classes. The proposed generative approach results in attacks with remarkably high attack success rates on various targets and datasets.
- [800] arXiv:2402.02733 (cross-list from cs.CV) [ pdf , ps , html , other ]
-
Title: ToonAging: Face Re-Aging upon Artistic Portrait Style TransferComments: 14 pages, 15 figures, 1 tableSubjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI); Graphics (cs.GR); Machine Learning (cs.LG); Multimedia (cs.MM)
Abstract: Face re-aging is a prominent field in computer vision and graphics, with significant applications in photorealistic domains such as movies, advertising, and live streaming. Recently, the need to apply face re-aging to non-photorealistic images, like comics, illustrations, and animations, has emerged as an extension in various entertainment sectors. However, the lack of a network that can seamlessly edit the apparent age in NPR images has limited these tasks to a naive, sequential approach. This often results in unpleasant artifacts and a loss of facial attributes due to domain discrepancies. In this paper, we introduce a novel one-stage method for face re-aging combined with portrait style transfer, executed in a single generative step. We leverage existing face re-aging and style transfer networks, both trained within the same PR domain. Our method uniquely fuses distinct latent vectors, each responsible for managing aging-related attributes and NPR appearance. By adopting an exemplar-based approach, our method offers greater flexibility compared to domain-level fine-tuning approaches, which typically require separate training or fine-tuning for each domain. This effectively addresses the limitation of requiring paired datasets for re-aging and domain-level, data-driven approaches for stylization. Our experiments show that our model can effortlessly generate re-aged images while simultaneously transferring the style of examples, maintaining both natural appearance and controllability.
- [801] arXiv:2402.02764 (cross-list from cs.IR) [ pdf , ps , other ]
-
Title: List-aware Reranking-Truncation Joint Model for Search and Retrieval-augmented GenerationComments: Accepted by WWW 2024Subjects: Information Retrieval (cs.IR) ; Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
Abstract: The results of information retrieval (IR) are usually presented in the form of a ranked list of candidate documents, such as web search for humans and retrieval-augmented generation for large language models (LLMs). List-aware retrieval aims to capture the list-level contextual features to return a better list, mainly including reranking and truncation. Reranking finely re-scores the documents in the list. Truncation dynamically determines the cut-off point of the ranked list to achieve the trade-off between overall relevance and avoiding misinformation from irrelevant documents. Previous studies treat them as two separate tasks and model them separately. However, the separation is not optimal. First, it is hard to share the contextual information of the ranking list between the two tasks. Second, the separate pipeline usually meets the error accumulation problem, where the small error from the reranking stage can largely affect the truncation stage. To solve these problems, we propose a Reranking-Truncation joint model (GenRT) that can perform the two tasks concurrently. GenRT integrates reranking and truncation via generative paradigm based on encoder-decoder architecture. We also design the novel loss functions for joint optimization to make the model learn both tasks. Sharing parameters by the joint model is conducive to making full use of the common modeling information of the two tasks. Besides, the two tasks are performed concurrently and co-optimized to solve the error accumulation problem between separate stages. Experiments on public learning-to-rank benchmarks and open-domain Q\&A tasks show that our method achieves SOTA performance on both reranking and truncation tasks for web search and retrieval-augmented LLMs.
- [802] arXiv:2402.02768 (cross-list from cs.NI) [ pdf , ps , other ]
-
Title: Intent Profiling and Translation Through Emergent CommunicationJournal-ref: IEEE International Conference on Communications (ICC2024)Subjects: Networking and Internet Architecture (cs.NI) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: To effectively express and satisfy network application requirements, intent-based network management has emerged as a promising solution. In intent-based methods, users and applications express their intent in a high-level abstract language to the network. Although this abstraction simplifies network operation, it induces many challenges to efficiently express applications' intents and map them to different network capabilities. Therefore, in this work, we propose an AI-based framework for intent profiling and translation. We consider a scenario where applications interacting with the network express their needs for network services in their domain language. The machine-to-machine communication (i.e., between applications and the network) is complex since it requires networks to learn how to understand the domain languages of each application, which is neither practical nor scalable. Instead, a framework based on emergent communication is proposed for intent profiling, in which applications express their abstract quality-of-experience (QoE) intents to the network through emergent communication messages. Subsequently, the network learns how to interpret these communication messages and map them to network capabilities (i.e., slices) to guarantee the requested Quality-of-Service (QoS). Simulation results show that the proposed method outperforms self-learning slicing and other baselines, and achieves a performance close to the perfect knowledge baseline.
- [803] arXiv:2402.02769 (cross-list from cs.LG) [ pdf , ps , other ]
-
Title: Learning from Teaching Regularization: Generalizable Correlations Should be Easy to ImitateSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Abstract: Generalization remains a central challenge in machine learning. In this work, we propose Learning from Teaching (LoT), a novel regularization technique for deep neural networks to enhance generalization. Inspired by the human ability to capture concise and abstract patterns, we hypothesize that generalizable correlations are expected to be easier to teach. LoT operationalizes this concept to improve the generalization of the main model with auxiliary student learners. The student learners are trained by the main model and improve the main model to capture more generalizable and teachable correlations by providing feedback. Our experimental results across several domains, including Computer Vision, Natural Language Processing, and Reinforcement Learning, demonstrate that the introduction of LoT brings significant benefits compared to merely training models on the original training data. It suggests the effectiveness of LoT in identifying generalizable information without falling into the swamp of complex patterns in data, making LoT a valuable addition to the current machine learning frameworks.
- [804] arXiv:2402.02781 (cross-list from cs.SD) [ pdf , ps , other ]
-
Title: Dual Knowledge Distillation for Efficient Sound Event DetectionComments: Accepted to ICASSP 2024 (Deep Neural Network Model Compression Workshop)Subjects: Sound (cs.SD) ; Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
Abstract: Sound event detection (SED) is essential for recognizing specific sounds and their temporal locations within acoustic signals. This becomes challenging particularly for on-device applications, where computational resources are limited. To address this issue, we introduce a novel framework referred to as dual knowledge distillation for developing efficient SED systems in this work. Our proposed dual knowledge distillation commences with temporal-averaging knowledge distillation (TAKD), utilizing a mean student model derived from the temporal averaging of the student model's parameters. This allows the student model to indirectly learn from a pre-trained teacher model, ensuring a stable knowledge distillation. Subsequently, we introduce embedding-enhanced feature distillation (EEFD), which involves incorporating an embedding distillation layer within the student model to bolster contextual learning. On DCASE 2023 Task 4A public evaluation dataset, our proposed SED system with dual knowledge distillation having merely one-third of the baseline model's parameters, demonstrates superior performance in terms of PSDS1 and PSDS2. This highlights the importance of proposed dual knowledge distillation for compact SED systems, which can be ideal for edge devices.
- [805] arXiv:2402.02791 (cross-list from cs.CL) [ pdf , ps , other ]
-
Title: Rethinking Optimization and Architecture for Tiny Language ModelsYehui Tang , Fangcheng Liu , Yunsheng Ni , Yuchuan Tian , Zheyuan Bai , Yi-Qi Hu , Sichao Liu , Shangling Jui , Kai Han , Yunhe WangSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: The power of large language models (LLMs) has been demonstrated through numerous data and computing resources. However, the application of language models on mobile devices is facing huge challenge on the computation and memory costs, that is, tiny language models with high performance are urgently required. Limited by the highly complex training process, there are many details for optimizing language models that are seldom studied carefully. In this study, based on a tiny language model with 1B parameters, we carefully design a series of empirical study to analyze the effect of each component. Three perspectives are mainly discussed, \ie, neural architecture, parameter initialization, and optimization strategy. Several design formulas are empirically proved especially effective for tiny language models, including tokenizer compression, architecture tweaking, parameter inheritance and multiple-round training. Then we train PanGu-$\pi$-1B Pro and PanGu-$\pi$-1.5B Pro on 1.6T multilingual corpora, following the established formulas. Experimental results demonstrate the improved optimization and architecture yield a notable average improvement of 8.87 on benchmark evaluation sets for PanGu-$\pi$-1B Pro. Besides, PanGu-$\pi$-1.5B Pro surpasses a range of SOTA models with larger model sizes, validating its superior performance. The code is available at this https URL .
- [806] arXiv:2402.02801 (cross-list from cs.CL) [ pdf , ps , other ]
-
Title: KS-Lottery: Finding Certified Lottery Tickets for Multilingual Language ModelsSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: The lottery ticket hypothesis posits the existence of ``winning tickets'' within a randomly initialized neural network. Do winning tickets exist for LLMs in fine-tuning scenarios? How can we find such winning tickets? In this paper, we propose KS-Lottery, a method to identify a small subset of LLM parameters highly effective in multilingual fine-tuning. Our key idea is to use Kolmogorov-Smirnov Test to analyze the distribution shift of parameters before and after fine-tuning. We further theoretically prove that KS-Lottery can find the certified winning tickets in the embedding layer, fine-tuning on the found parameters is guaranteed to perform as well as full fine-tuning. Comparing KS-Lottery with other parameter-efficient tuning algorithms on translation tasks, the experimental results show that KS-Lottery finds a much smaller set of parameters for fine-tuning while achieving the comparable performance as full fine-tuning LLM. Surprisingly, we find that fine-tuning 18 tokens' embedding of LLaMA suffices to reach the fine-tuning translation performance. Code and model will be released to the public.
- [807] arXiv:2402.02803 (cross-list from cs.IR) [ pdf , ps , other ]
-
Title: Large Language Model Distilling Medication Recommendation ModelSubjects: Information Retrieval (cs.IR) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: The recommendation of medication is a vital aspect of intelligent healthcare systems, as it involves prescribing the most suitable drugs based on a patient's specific health needs. Unfortunately, many sophisticated models currently in use tend to overlook the nuanced semantics of medical data, while only relying heavily on identities. Furthermore, these models face significant challenges in handling cases involving patients who are visiting the hospital for the first time, as they lack prior prescription histories to draw upon. To tackle these issues, we harness the powerful semantic comprehension and input-agnostic characteristics of Large Language Models (LLMs). Our research aims to transform existing medication recommendation methodologies using LLMs. In this paper, we introduce a novel approach called Large Language Model Distilling Medication Recommendation (LEADER). We begin by creating appropriate prompt templates that enable LLMs to suggest medications effectively. However, the straightforward integration of LLMs into recommender systems leads to an out-of-corpus issue specific to drugs. We handle it by adapting the LLMs with a novel output layer and a refined tuning loss function. Although LLM-based models exhibit remarkable capabilities, they are plagued by high computational costs during inference, which is impractical for the healthcare sector. To mitigate this, we have developed a feature-level knowledge distillation technique, which transfers the LLM's proficiency to a more compact model. Extensive experiments conducted on two real-world datasets, MIMIC-III and MIMIC-IV, demonstrate that our proposed model not only delivers effective results but also is efficient. To ease the reproducibility of our experiments, we release the implementation code online.
- [808] arXiv:2402.02823 (cross-list from cs.LG) [ pdf , ps , other ]
-
Title: Evading Data Contamination Detection for Language Models is (too) EasySubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Cryptography and Security (cs.CR)
Abstract: Large language models are widespread, with their performance on benchmarks frequently guiding user preferences for one model over another. However, the vast amount of data these models are trained on can inadvertently lead to contamination with public benchmarks, thus compromising performance measurements. While recently developed contamination detection methods try to address this issue, they overlook the possibility of deliberate contamination by malicious model providers aiming to evade detection. We argue that this setting is of crucial importance as it casts doubt on the reliability of public benchmarks. To more rigorously study this issue, we propose a categorization of both model providers and contamination detection methods. This reveals vulnerabilities in existing methods that we exploit with EAL, a simple yet effective contamination technique that significantly inflates benchmark performance while completely evading current detection methods.
- [809] arXiv:2402.02826 (cross-list from cs.CV) [ pdf , ps , other ]
-
Title: SynthVision -- Harnessing Minimal Input for Maximal Output in Computer Vision Models using Synthetic Image dataYudara Kularathne , Prathapa Janitha , Sithira Ambepitiya , Thanveer Ahamed , Dinuka Wijesundara , Prarththanan SothyrajahComments: 12 pages 5 figures 1 tableSubjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: Rapid development of disease detection computer vision models is vital in response to urgent medical crises like epidemics or events of bioterrorism. However, traditional data gathering methods are too slow for these scenarios necessitating innovative approaches to generate reliable models quickly from minimal data. We demonstrate our new approach by building a comprehensive computer vision model for detecting Human Papilloma Virus Genital warts using only synthetic data. In our study, we employed a two phase experimental design using diffusion models. In the first phase diffusion models were utilized to generate a large number of diverse synthetic images from 10 HPV guide images explicitly focusing on accurately depicting genital warts. The second phase involved the training and testing vision model using this synthetic dataset. This method aimed to assess the effectiveness of diffusion models in rapidly generating high quality training data and the subsequent impact on the vision model performance in medical image recognition. The study findings revealed significant insights into the performance of the vision model trained on synthetic images generated through diffusion models. The vision model showed exceptional performance in accurately identifying cases of genital warts. It achieved an accuracy rate of 96% underscoring its effectiveness in medical image classification. For HPV cases the model demonstrated a high precision of 99% and a recall of 94%. In normal cases the precision was 95% with an impressive recall of 99%. These metrics indicate the model capability to correctly identify true positive cases and minimize false positives. The model achieved an F1 Score of 96% for HPV cases and 97% for normal cases. The high F1 Score across both categories highlights the balanced nature of the model precision and recall ensuring reliability and robustness in its predictions.
- [810] arXiv:2402.02844 (cross-list from cs.CL) [ pdf , ps , other ]
-
Title: Comparing Knowledge Sources for Open-Domain Scientific Claim VerificationComments: Accepted to EACL 2024Subjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI); Information Retrieval (cs.IR)
Abstract: The increasing rate at which scientific knowledge is discovered and health claims shared online has highlighted the importance of developing efficient fact-checking systems for scientific claims. The usual setting for this task in the literature assumes that the documents containing the evidence for claims are already provided and annotated or contained in a limited corpus. This renders the systems unrealistic for real-world settings where knowledge sources with potentially millions of documents need to be queried to find relevant evidence. In this paper, we perform an array of experiments to test the performance of open-domain claim verification systems. We test the final verdict prediction of systems on four datasets of biomedical and health claims in different settings. While keeping the pipeline's evidence selection and verdict prediction parts constant, document retrieval is performed over three common knowledge sources (PubMed, Wikipedia, Google) and using two different information retrieval techniques. We show that PubMed works better with specialized biomedical claims, while Wikipedia is more suited for everyday health concerns. Likewise, BM25 excels in retrieval precision, while semantic search in recall of relevant evidence. We discuss the results, outline frequent retrieval patterns and challenges, and provide promising future directions.
- [811] arXiv:2402.02896 (cross-list from cs.CL) [ pdf , ps , html , other ]
-
Title: LLM Agents in Interaction: Measuring Personality Consistency and Linguistic Alignment in Interacting Populations of Large Language ModelsComments: To appear in Proceedings of the 1st Personalization of Generative AI Workshop, EACL 2024Subjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI); Computers and Society (cs.CY); Multiagent Systems (cs.MA)
Abstract: While both agent interaction and personalisation are vibrant topics in research on large language models (LLMs), there has been limited focus on the effect of language interaction on the behaviour of persona-conditioned LLM agents. Such an endeavour is important to ensure that agents remain consistent to their assigned traits yet are able to engage in open, naturalistic dialogues. In our experiments, we condition GPT-3.5 on personality profiles through prompting and create a two-group population of LLM agents using a simple variability-inducing sampling algorithm. We then administer personality tests and submit the agents to a collaborative writing task, finding that different profiles exhibit different degrees of personality consistency and linguistic alignment to their conversational partners. Our study seeks to lay the groundwork for better understanding of dialogue-based interaction between LLMs and highlights the need for new approaches to crafting robust, more human-like LLM personas for interactive environments.
- [812] arXiv:2402.02904 (cross-list from cs.RO) [ pdf , ps , other ]
-
Title: Replication of Impedance Identification Experiments on a Reinforcement-Learning-Controlled Digital Twin of Human ElbowsComments: 8 pages, 5 figures; Submitted to WCCI-2024Subjects: Robotics (cs.RO) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: This study presents a pioneering effort to replicate human neuromechanical experiments within a virtual environment utilising a digital human model. By employing MyoSuite, a state-of-the-art human motion simulation platform enhanced by Reinforcement Learning (RL), multiple types of impedance identification experiments of human elbow were replicated on a musculoskeletal model. We compared the elbow movement controlled by an RL agent with the motion of an actual human elbow in terms of the impedance identified in torque-perturbation experiments. The findings reveal that the RL agent exhibits higher elbow impedance to stabilise the target elbow motion under perturbation than a human does, likely due to its shorter reaction time and superior sensory capabilities. This study serves as a preliminary exploration into the potential of virtual environment simulations for neuromechanical research, offering an initial yet promising alternative to conventional experimental approaches. An RL-controlled digital twin with complete musculoskeletal models of the human body is expected to be useful in designing experiments and validating rehabilitation theory before experiments on real human subjects.
- [813] arXiv:2402.02910 (cross-list from cs.LG) [ pdf , ps , html , other ]
-
Title: DS-MS-TCN: Otago Exercises Recognition with a Dual-Scale Multi-Stage Temporal Convolutional NetworkMeng Shang , Lenore Dedeyne , Jolan Dupont , Laura Vercauteren , Nadjia Amini , Laurence Lapauw , Evelien Gielen , Sabine Verschueren , Carolina Varon , Walter De Raedt , Bart VanrumsteSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Signal Processing (eess.SP)
Abstract: The Otago Exercise Program (OEP) represents a crucial rehabilitation initiative tailored for older adults, aimed at enhancing balance and strength. Despite previous efforts utilizing wearable sensors for OEP recognition, existing studies have exhibited limitations in terms of accuracy and robustness. This study addresses these limitations by employing a single waist-mounted Inertial Measurement Unit (IMU) to recognize OEP exercises among community-dwelling older adults in their daily lives. A cohort of 36 older adults participated in laboratory settings, supplemented by an additional 7 older adults recruited for at-home assessments. The study proposes a Dual-Scale Multi-Stage Temporal Convolutional Network (DS-MS-TCN) designed for two-level sequence-to-sequence classification, incorporating them in one loss function. In the first stage, the model focuses on recognizing each repetition of the exercises (micro labels). Subsequent stages extend the recognition to encompass the complete range of exercises (macro labels). The DS-MS-TCN model surpasses existing state-of-the-art deep learning models, achieving f1-scores exceeding 80% and Intersection over Union (IoU) f1-scores surpassing 60% for all four exercises evaluated. Notably, the model outperforms the prior study utilizing the sliding window technique, eliminating the need for post-processing stages and window size tuning. To our knowledge, we are the first to present a novel perspective on enhancing Human Activity Recognition (HAR) systems through the recognition of each repetition of activities.
- [814] arXiv:2402.02921 (cross-list from cs.DB) [ pdf , ps , other ]
-
Title: Mining a Minimal Set of Behavioral Patterns using Incremental EvaluationSubjects: Databases (cs.DB) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: Process mining provides methods to analyse event logs generated by information systems during the execution of processes. It thereby supports the design, validation, and execution of processes in domains ranging from healthcare, through manufacturing, to e-commerce. To explore the regularities of flexible processes that show a large behavioral variability, it was suggested to mine recurrent behavioral patterns that jointly describe the underlying process. Existing approaches to behavioral pattern mining, however, suffer from two limitations. First, they show limited scalability as incremental computation is incorporated only in the generation of pattern candidates, but not in the evaluation of their quality. Second, process analysis based on mined patterns shows limited effectiveness due to an overwhelmingly large number of patterns obtained in practical application scenarios, many of which are redundant. In this paper, we address these limitations to facilitate the analysis of complex, flexible processes based on behavioral patterns. Specifically, we improve COBPAM, our initial behavioral pattern mining algorithm, by an incremental procedure to evaluate the quality of pattern candidates, optimizing thereby its efficiency. Targeting a more effective use of the resulting patterns, we further propose pruning strategies for redundant patterns and show how relations between the remaining patterns are extracted and visualized to provide process insights. Our experiments with diverse real-world datasets indicate a considerable reduction of the runtime needed for pattern mining, while a qualitative assessment highlights how relations between patterns guide the analysis of the underlying process.
- [815] arXiv:2402.02977 (cross-list from cs.LG) [ pdf , ps , html , other ]
-
Title: Variational Flow Models: Flowing in Your StyleSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Abstract: We introduce "posterior flows" - generalizations of "probability flows" to a broader class of stochastic processes not necessarily diffusion processes - and propose a systematic training-free method to transform the posterior flow of a "linear" stochastic process characterized by the equation Xt = at * X0 + st * X1 into a straight constant-speed (SC) flow, reminiscent of Rectified Flow. This transformation facilitates fast sampling along the original posterior flow without training a new model of the SC flow. The flexibility of our approach allows us to extend our transformation to inter-convert two posterior flows from distinct "linear" stochastic processes. Moreover, we can easily integrate high-order numerical solvers into the transformed SC flow, further enhancing sampling accuracy and efficiency. Rigorous theoretical analysis and extensive experimental results substantiate the advantages of our framework.
- [816] arXiv:2402.02987 (cross-list from cs.CR) [ pdf , ps , other ]
-
Title: Conversation Reconstruction Attack Against GPT ModelsComments: 17 pages, 11 figuresSubjects: Cryptography and Security (cs.CR) ; Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG)
Abstract: In recent times, significant advancements have been made in the field of large language models (LLMs), represented by GPT series models. To optimize task execution, users often engage in multi-round conversations with GPT models hosted in cloud environments. These multi-round conversations, potentially replete with private information, require transmission and storage within the cloud. However, this operational paradigm introduces additional attack surfaces. In this paper, we first introduce a specific Conversation Reconstruction Attack targeting GPT models. Our introduced Conversation Reconstruction Attack is composed of two steps: hijacking a session and reconstructing the conversations. Subsequently, we offer an exhaustive evaluation of the privacy risks inherent in conversations when GPT models are subjected to the proposed attack. However, GPT-4 demonstrates certain robustness to the proposed attacks. We then introduce two advanced attacks aimed at better reconstructing previous conversations, specifically the UNR attack and the PBU attack. Our experimental findings indicate that the PBU attack yields substantial performance across all models, achieving semantic similarity scores exceeding 0.60, while the UNR attack is effective solely on GPT-3.5. Our results reveal the concern about privacy risks associated with conversations involving GPT models and aim to draw the community's attention to prevent the potential misuse of these models' remarkable capabilities. We will responsibly disclose our findings to the suppliers of related large language models.
- [817] arXiv:2402.02992 (cross-list from cs.LG) [ pdf , ps , other ]
-
Title: Decoding-time Realignment of Language ModelsTianlin Liu , Shangmin Guo , Leonardo Bianco , Daniele Calandriello , Quentin Berthet , Felipe Llinares , Jessica Hoffmann , Lucas Dixon , Michal Valko , Mathieu BlondelSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
Abstract: Aligning language models with human preferences is crucial for reducing errors and biases in these models. Alignment techniques, such as reinforcement learning from human feedback (RLHF), are typically cast as optimizing a tradeoff between human preference rewards and a proximity regularization term that encourages staying close to the unaligned model. Selecting an appropriate level of regularization is critical: insufficient regularization can lead to reduced model capabilities due to reward hacking, whereas excessive regularization hinders alignment. Traditional methods for finding the optimal regularization level require retraining multiple models with varying regularization strengths. This process, however, is resource-intensive, especially for large models. To address this challenge, we propose decoding-time realignment (DeRa), a simple method to explore and evaluate different regularization strengths in aligned models without retraining. DeRa enables control over the degree of alignment, allowing users to smoothly transition between unaligned and aligned models. It also enhances the efficiency of hyperparameter tuning by enabling the identification of effective regularization strengths using a validation dataset.
- [818] arXiv:2402.03009 (cross-list from cs.CL) [ pdf , ps , other ]
-
Title: UniMem: Towards a Unified View of Long-Context Large Language ModelsJunjie Fang , Likai Tang , Hongzhe Bi , Yujia Qin , Si Sun , Zhenyu Li , Haolun Li , Yongjian Li , Xin Cong , Yukun Yan , Xiaodong Shi , Sen Song , Yankai Lin , Zhiyuan Liu , Maosong SunSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI)
Abstract: Long-context processing is a critical ability that constrains the applicability of large language models. Although there exist various methods devoted to enhancing the long-context processing ability of large language models (LLMs), they are developed in an isolated manner and lack systematic analysis and integration of their strengths, hindering further developments. In this paper, we introduce UniMem, a unified framework that reformulates existing long-context methods from the view of memory augmentation of LLMs. UniMem is characterized by four key dimensions: Memory Management, Memory Writing, Memory Reading, and Memory Injection, providing a systematic theory for understanding various long-context methods. We reformulate 16 existing methods based on UniMem and analyze four representative methods: Transformer-XL, Memorizing Transformer, RMT, and Longformer into equivalent UniMem forms to reveal their design principles and strengths. Based on these analyses, we propose UniMix, an innovative approach that integrates the strengths of these algorithms. Experimental results show that UniMix achieves superior performance in handling long contexts with significantly lower perplexity than baselines.
- [819] arXiv:2402.03014 (cross-list from cs.LG) [ pdf , ps , other ]
-
Title: Whom to Trust? Elective Learning for Distributed Gaussian Process RegressionComments: 9 pages, conference preprintSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Abstract: This paper introduces an innovative approach to enhance distributed cooperative learning using Gaussian process (GP) regression in multi-agent systems (MASs). The key contribution of this work is the development of an elective learning algorithm, namely prior-aware elective distributed GP (Pri-GP), which empowers agents with the capability to selectively request predictions from neighboring agents based on their trustworthiness. The proposed Pri-GP effectively improves individual prediction accuracy, especially in cases where the prior knowledge of an agent is incorrect. Moreover, it eliminates the need for computationally intensive variance calculations for determining aggregation weights in distributed GP. Furthermore, we establish a prediction error bound within the Pri-GP framework, ensuring the reliability of predictions, which is regarded as a crucial property in safety-critical MAS applications.
- [820] arXiv:2402.03017 (cross-list from cs.LG) [ pdf , ps , other ]
-
Title: Toward Green and Human-Like Artificial Intelligence: A Complete Survey on Contemporary Few-Shot Learning ApproachesGeorgios Tsoumplekas , Vladislav Li , Vasileios Argyriou , Anastasios Lytos , Eleftherios Fountoukidis , Sotirios K. Goudos , Ioannis D. Moscholios , Panagiotis SarigiannidisComments: 35 pages, 9 figures. Submitted to ACM Computing SurveysSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Abstract: Despite deep learning's widespread success, its data-hungry and computationally expensive nature makes it impractical for many data-constrained real-world applications. Few-Shot Learning (FSL) aims to address these limitations by enabling rapid adaptation to novel learning tasks, seeing significant growth in recent years. This survey provides a comprehensive overview of the field's latest advancements. Initially, FSL is formally defined, and its relationship with different learning fields is presented. A novel taxonomy is introduced, extending previously proposed ones, and real-world applications in classic and novel fields are described. Finally, recent trends shaping the field, outstanding challenges, and promising future research directions are discussed.
- [821] arXiv:2402.03038 (cross-list from cs.LG) [ pdf , ps , html , other ]
-
Title: Automatic Combination of Sample Selection Strategies for Few-Shot LearningSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
Abstract: In few-shot learning, such as meta-learning, few-shot fine-tuning or in-context learning, the limited number of samples used to train a model have a significant impact on the overall success. Although a large number of sample selection strategies exist, their impact on the performance of few-shot learning is not extensively known, as most of them have been so far evaluated in typical supervised settings only. In this paper, we thoroughly investigate the impact of 20 sample selection strategies on the performance of 5 few-shot learning approaches over 8 image and 6 text datasets. In addition, we propose a new method for automatic combination of sample selection strategies (ACSESS) that leverages the strengths and complementary information of the individual strategies. The experimental results show that our method consistently outperforms the individual selection strategies, as well as the recently proposed method for selecting support examples for in-context learning. We also show a strong modality, dataset and approach dependence for the majority of strategies as well as their dependence on the number of shots - demonstrating that the sample selection strategies play a significant role for lower number of shots, but regresses to random selection at higher number of shots.
- [822] arXiv:2402.03040 (cross-list from cs.CV) [ pdf , ps , other ]
-
Title: InteractiveVideo: User-Centric Controllable Video Generation with Synergistic Multimodal InstructionsComments: Code, models, and demo are available at this https URLSubjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Multimedia (cs.MM)
Abstract: We introduce $\textit{InteractiveVideo}$, a user-centric framework for video generation. Different from traditional generative approaches that operate based on user-provided images or text, our framework is designed for dynamic interaction, allowing users to instruct the generative model through various intuitive mechanisms during the whole generation process, e.g. text and image prompts, painting, drag-and-drop, etc. We propose a Synergistic Multimodal Instruction mechanism, designed to seamlessly integrate users' multimodal instructions into generative models, thus facilitating a cooperative and responsive interaction between user inputs and the generative process. This approach enables iterative and fine-grained refinement of the generation result through precise and effective user instructions. With $\textit{InteractiveVideo}$, users are given the flexibility to meticulously tailor key aspects of a video. They can paint the reference image, edit semantics, and adjust video motions until their requirements are fully met. Code, models, and demo are available at this https URL
- [823] arXiv:2402.03049 (cross-list from cs.CL) [ pdf , ps , html , other ]
-
Title: EasyInstruct: An Easy-to-use Instruction Processing Framework for Large Language ModelsYixin Ou , Ningyu Zhang , Honghao Gui , Ziwen Xu , Shuofei Qiao , Yida Xue , Runnan Fang , Kangwei Liu , Lei Li , Zhen Bi , Guozhou Zheng , Huajun ChenComments: Project website: this https URL Code: this https URL Video: this https URL Demo: this https URLSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI); Human-Computer Interaction (cs.HC); Information Retrieval (cs.IR); Machine Learning (cs.LG)
Abstract: In recent years, instruction tuning has gained increasing attention and emerged as a crucial technique to enhance the capabilities of Large Language Models (LLMs). To construct high-quality instruction datasets, many instruction processing approaches have been proposed, aiming to achieve a delicate balance between data quantity and data quality. Nevertheless, due to inconsistencies that persist among various instruction processing methods, there is no standard open-source instruction processing implementation framework available for the community, which hinders practitioners from further developing and advancing. To facilitate instruction processing research and development, we present EasyInstruct, an easy-to-use instruction processing framework for LLMs, which modularizes instruction generation, selection, and prompting, while also considering their combination and interaction. EasyInstruct is publicly released and actively maintained at this https URL , along with an online demo app and a demo video for quick-start, calling for broader research centered on instruction data and synthetic data.
- [824] arXiv:2402.03067 (cross-list from cs.CL) [ pdf , ps , other ]
-
Title: Multilingual transformer and BERTopic for short text topic modeling: The case of SerbianJournal-ref: Trajanovic, M., Filipovic, N., Zdravkovic, M. (eds) Disruptive Information Technologies for a Smart Society. ICIST 2023. Lecture Notes in Networks and Systems, vol 872. Springer, ChamSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI)
Abstract: This paper presents the results of the first application of BERTopic, a state-of-the-art topic modeling technique, to short text written in a morphologi-cally rich language. We applied BERTopic with three multilingual embed-ding models on two levels of text preprocessing (partial and full) to evalu-ate its performance on partially preprocessed short text in Serbian. We also compared it to LDA and NMF on fully preprocessed text. The experiments were conducted on a dataset of tweets expressing hesitancy toward COVID-19 vaccination. Our results show that with adequate parameter setting, BERTopic can yield informative topics even when applied to partially pre-processed short text. When the same parameters are applied in both prepro-cessing scenarios, the performance drop on partially preprocessed text is minimal. Compared to LDA and NMF, judging by the keywords, BERTopic offers more informative topics and gives novel insights when the number of topics is not limited. The findings of this paper can be significant for re-searchers working with other morphologically rich low-resource languages and short text.
- [825] arXiv:2402.03072 (cross-list from q-bio.NC) [ pdf , ps , other ]
-
Title: Learning to Abstract Visuomotor Mappings using Meta-Reinforcement LearningSubjects: Neurons and Cognition (q-bio.NC) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: We investigated the human capacity to acquire multiple visuomotor mappings for de novo skills. Using a grid navigation paradigm, we tested whether contextual cues implemented as different "grid worlds", allow participants to learn two distinct key-mappings more efficiently. Our results indicate that when contextual information is provided, task performance is significantly better. The same held true for meta-reinforcement learning agents that differed in whether or not they receive contextual information when performing the task. We evaluated their accuracy in predicting human performance in the task and analyzed their internal representations. The results indicate that contextual cues allow the formation of separate representations in space and time when using different visuomotor mappings, whereas the absence of them favors sharing one representation. While both strategies can allow learning of multiple visuomotor mappings, we showed contextual cues provide a computational advantage in terms of how many mappings can be learned.
- [826] arXiv:2402.03081 (cross-list from cs.RO) [ pdf , ps , other ]
-
Title: Preference-Conditioned Language-Guided AbstractionAndi Peng , Andreea Bobu , Belinda Z. Li , Theodore R. Sumers , Ilia Sucholutsky , Nishanth Kumar , Thomas L. Griffiths , Julie A. ShahComments: HRI 2024Subjects: Robotics (cs.RO) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: Learning from demonstrations is a common way for users to teach robots, but it is prone to spurious feature correlations. Recent work constructs state abstractions, i.e. visual representations containing task-relevant features, from language as a way to perform more generalizable learning. However, these abstractions also depend on a user's preference for what matters in a task, which may be hard to describe or infeasible to exhaustively specify using language alone. How do we construct abstractions to capture these latent preferences? We observe that how humans behave reveals how they see the world. Our key insight is that changes in human behavior inform us that there are differences in preferences for how humans see the world, i.e. their state abstractions. In this work, we propose using language models (LMs) to query for those preferences directly given knowledge that a change in behavior has occurred. In our framework, we use the LM in two ways: first, given a text description of the task and knowledge of behavioral change between states, we query the LM for possible hidden preferences; second, given the most likely preference, we query the LM to construct the state abstraction. In this framework, the LM is also able to ask the human directly when uncertain about its own estimate. We demonstrate our framework's ability to construct effective preference-conditioned abstractions in simulated experiments, a user study, as well as on a real Spot robot performing mobile manipulation tasks.
- [827] arXiv:2402.03099 (cross-list from cs.CL) [ pdf , ps , other ]
-
Title: Intent-based Prompt Calibration: Enhancing prompt optimization with synthetic boundary casesSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: Prompt engineering is a challenging and important task due to the high sensitivity of Large Language Models (LLMs) to the given prompt and the inherent ambiguity of a textual task instruction. Automatic prompt engineering is essential to achieve optimized performance from LLMs. Recent studies have demonstrated the capabilities of LLMs to automatically conduct prompt engineering by employing a meta-prompt that incorporates the outcomes of the last trials and proposes an improved prompt. However, this requires a high-quality benchmark to compare different prompts, which is difficult and expensive to acquire in many real-world use cases. In this work, we introduce a new method for automatic prompt engineering, using a calibration process that iteratively refines the prompt to the user intent. During the optimization process, the system jointly generates synthetic data of boundary use cases and optimizes the prompt according to the generated dataset. We demonstrate the effectiveness of our method with respect to strong proprietary models on real-world tasks such as moderation and generation. Our method outperforms state-of-the-art methods with a limited number of annotated samples. Furthermore, we validate the advantages of each one of the system's key components. Our system is built in a modular way, facilitating easy adaptation to other tasks. The code is available $\href{ this https URL }{here}$.
- [828] arXiv:2402.03110 (cross-list from cs.LG) [ pdf , ps , other ]
-
Title: Non-Stationary Latent Auto-Regressive BanditsSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Abstract: We consider the stochastic multi-armed bandit problem with non-stationary rewards. We present a novel formulation of non-stationarity in the environment where changes in the mean reward of the arms over time are due to some unknown, latent, auto-regressive (AR) state of order $k$. We call this new environment the latent AR bandit. Different forms of the latent AR bandit appear in many real-world settings, especially in emerging scientific fields such as behavioral health or education where there are few mechanistic models of the environment. If the AR order $k$ is known, we propose an algorithm that achieves $\tilde{O}(k\sqrt{T})$ regret in this setting. Empirically, our algorithm outperforms standard UCB across multiple non-stationary environments, even if $k$ is mis-specified.
- [829] arXiv:2402.03112 (cross-list from cs.LG) [ pdf , ps , other ]
-
Title: Infrared Spectra Prediction for Diazo Groups Utilizing a Machine Learning Approach with Structural Attention MechanismComments: 21 pages, 5 figuresSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Chemical Physics (physics.chem-ph)
Abstract: Infrared (IR) spectroscopy is a pivotal technique in chemical research for elucidating molecular structures and dynamics through vibrational and rotational transitions. However, the intricate molecular fingerprints characterized by unique vibrational and rotational patterns present substantial analytical challenges. Here, we present a machine learning approach employing a Structural Attention Mechanism tailored to enhance the prediction and interpretation of infrared spectra, particularly for diazo compounds. Our model distinguishes itself by honing in on chemical information proximal to functional groups, thereby significantly bolstering the accuracy, robustness, and interpretability of spectral predictions. This method not only demystifies the correlations between infrared spectral features and molecular structures but also offers a scalable and efficient paradigm for dissecting complex molecular interactions.
- [830] arXiv:2402.03119 (cross-list from cs.CV) [ pdf , ps , other ]
-
Title: Good Teachers Explain: Explanation-Enhanced Knowledge DistillationComments: 21 pages, 12 figuresSubjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: Knowledge Distillation (KD) has proven effective for compressing large teacher models into smaller student models. While it is well known that student models can achieve similar accuracies as the teachers, it has also been shown that they nonetheless often do not learn the same function. It is, however, often highly desirable that the student's and teacher's functions share similar properties such as basing the prediction on the same input features, as this ensures that students learn the 'right features' from the teachers. In this work, we explore whether this can be achieved by not only optimizing the classic KD loss but also the similarity of the explanations generated by the teacher and the student. Despite the idea being simple and intuitive, we find that our proposed 'explanation-enhanced' KD (e$^2$KD) (1) consistently provides large gains in terms of accuracy and student-teacher agreement, (2) ensures that the student learns from the teacher to be right for the right reasons and to give similar explanations, and (3) is robust with respect to the model architectures, the amount of training data, and even works with 'approximate', pre-computed explanations.
- [831] arXiv:2402.03130 (cross-list from cs.SE) [ pdf , ps , html , other ]
-
Title: Evaluation of ChatGPT Usability as A Code Generation ToolSubjects: Software Engineering (cs.SE) ; Artificial Intelligence (cs.AI)
Abstract: With the rapid advance of machine learning (ML) technology, large language models (LLMs) are increasingly explored as an intelligent tool to generate program code from natural language specifications. However, existing evaluations of LLMs have focused on their capabilities in comparison with humans. It is desirable to evaluate their usability when deciding on whether to use a LLM in software production. This paper proposes a user centric method. It includes metadata in the test cases of a benchmark to describe their usages, conducts testing in a multi-attempt process that mimic the uses of LLMs, measures LLM generated solutions on a set of quality attributes that reflect usability, and evaluates the performance based on user experiences in the uses of LLMs as a tool. The paper reports an application of the method in the evaluation of ChatGPT usability as a code generation tool for the R programming language. Our experiments demonstrated that ChatGPT is highly useful for generating R program code although it may fail on hard programming tasks. The user experiences are good with overall average number of attempts being 1.61 and the average time of completion being 47.02 seconds. Our experiments also found that the weakest aspect of usability is conciseness, which has a score of 3.80 out of 5. Our experiment also shows that it is hard for human developers to learn from experiences to improve the skill of using ChatGPT to generate code.
- [832] arXiv:2402.03138 (cross-list from cs.LG) [ pdf , ps , other ]
-
Title: Just Cluster It: An Approach for Exploration in High-Dimensions using Clustering and Pre-Trained RepresentationsSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Abstract: In this paper we adopt a representation-centric perspective on exploration in reinforcement learning, viewing exploration fundamentally as a density estimation problem. We investigate the effectiveness of clustering representations for exploration in 3-D environments, based on the observation that the importance of pixel changes between transitions is less pronounced in 3-D environments compared to 2-D environments, where pixel changes between transitions are typically distinct and significant. We propose a method that performs episodic and global clustering on random representations and on pre-trained DINO representations to count states, i.e, estimate pseudo-counts. Surprisingly, even random features can be clustered effectively to count states in 3-D environments, however when these become visually more complex, pre-trained DINO representations are more effective thanks to the pre-trained inductive biases in the representations. Overall, this presents a pathway for integrating pre-trained biases into exploration. We evaluate our approach on the VizDoom and Habitat environments, demonstrating that our method surpasses other well-known exploration methods in these settings.
- [833] arXiv:2402.03141 (cross-list from cs.LG) [ pdf , ps , other ]
-
Title: Boosting Long-Delayed Reinforcement Learning with Auxiliary Short-Delayed TaskSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Systems and Control (eess.SY)
Abstract: Reinforcement learning is challenging in delayed scenarios, a common real-world situation where observations and interactions occur with delays. State-of-the-art (SOTA) state-augmentation techniques either suffer from the state-space explosion along with the delayed steps, or performance degeneration in stochastic environments. To address these challenges, our novel Auxiliary-Delayed Reinforcement Learning (AD-RL) leverages an auxiliary short-delayed task to accelerate the learning on a long-delayed task without compromising the performance in stochastic environments. Specifically, AD-RL learns the value function in the short-delayed task and then employs it with the bootstrapping and policy improvement techniques in the long-delayed task. We theoretically show that this can greatly reduce the sample complexity compared to directly learning on the original long-delayed task. On deterministic and stochastic benchmarks, our method remarkably outperforms the SOTAs in both sample efficiency and policy performance.
- [834] arXiv:2402.03172 (cross-list from cs.CL) [ pdf , ps , other ]
-
Title: Accurate and Well-Calibrated ICD Code Assignment Through Attention Over Diverse Label EmbeddingsComments: Accepted to EACL2024Subjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI)
Abstract: Although the International Classification of Diseases (ICD) has been adopted worldwide, manually assigning ICD codes to clinical text is time-consuming, error-prone, and expensive, motivating the development of automated approaches. This paper describes a novel approach for automated ICD coding, combining several ideas from previous related work. We specifically employ a strong Transformer-based model as a text encoder and, to handle lengthy clinical narratives, we explored either (a) adapting the base encoder model into a Longformer, or (b) dividing the text into chunks and processing each chunk independently. The representations produced by the encoder are combined with a label embedding mechanism that explores diverse ICD code synonyms. Experiments with different splits of the MIMIC-III dataset show that the proposed approach outperforms the current state-of-the-art models in ICD coding, with the label embeddings significantly contributing to the good performance. Our approach also leads to properly calibrated classification results, which can effectively inform downstream tasks such as quantification.
- [835] arXiv:2402.03173 (cross-list from cs.CL) [ pdf , ps , html , other ]
-
Title: MULTI: Multimodal Understanding Leaderboard with Text and ImagesZichen Zhu , Yang Xu , Lu Chen , Jingkai Yang , Yichuan Ma , Yiming Sun , Hailin Wen , Jiaqi Liu , Jinyu Cai , Yingzi Ma , Situo Zhang , Zihan Zhao , Liangtai Sun , Kai YuComments: 16 pages, 9 figures, 10 tables. Details and access are available at: this https URLSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
Abstract: Rapid progress in multimodal large language models (MLLMs) highlights the need to introduce challenging yet realistic benchmarks to the academic community, while existing benchmarks primarily focus on understanding simple natural images and short context. In this paper, we present MULTI as a cutting-edge benchmark for evaluating MLLMs on understanding complex tables and images, and reasoning with long context. MULTI provides multimodal inputs and requires responses that are either precise or open-ended, reflecting real-life examination styles. MULTI includes over 18,000 questions and challenges MLLMs with a variety of tasks, ranging from formula derivation to image detail analysis and cross-modality reasoning. We also introduce MULTI-Elite, a 500-question selected hard subset, and MULTI-Extend, with more than 4,500 external knowledge context pieces. Our evaluation indicates significant potential for MLLM advancement, with GPT-4V achieving a 63.7% accuracy rate on MULTI, in contrast to other MLLMs scoring between 28.5% and 55.3%. MULTI serves not only as a robust evaluation platform but also paves the way for the development of expert-level AI.
- [836] arXiv:2402.03175 (cross-list from cs.LG) [ pdf , ps , other ]
-
Title: The Matrix: A Bayesian learning model for LLMsComments: 12 pages, 6 figuresSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Abstract: In this paper, we introduce a Bayesian learning model to understand the behavior of Large Language Models (LLMs). We explore the optimization metric of LLMs, which is based on predicting the next token, and develop a novel model grounded in this principle. Our approach involves constructing an ideal generative text model represented by a multinomial transition probability matrix with a prior, and we examine how LLMs approximate this matrix. We discuss the continuity of the mapping between embeddings and multinomial distributions, and present the Dirichlet approximation theorem to approximate any prior. Additionally, we demonstrate how text generation by LLMs aligns with Bayesian learning principles and delve into the implications for in-context learning, specifically explaining why in-context learning emerges in larger models where prompts are considered as samples to be updated. Our findings indicate that the behavior of LLMs is consistent with Bayesian Learning, offering new insights into their functioning and potential applications.
- [837] arXiv:2402.03176 (cross-list from cs.IR) [ pdf , ps , other ]
-
Title: Comparison of Topic Modelling Approaches in the Banking ContextComments: 14 pages, Journal of Applied ScienceJournal-ref: Applied Sciences (2023), 13(2), 797Subjects: Information Retrieval (cs.IR) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Computation (stat.CO)
Abstract: Topic modelling is a prominent task for automatic topic extraction in many applications such as sentiment analysis and recommendation systems. The approach is vital for service industries to monitor their customer discussions. The use of traditional approaches such as Latent Dirichlet Allocation (LDA) for topic discovery has shown great performances, however, they are not consistent in their results as these approaches suffer from data sparseness and inability to model the word order in a document. Thus, this study presents the use of Kernel Principal Component Analysis (KernelPCA) and K-means Clustering in the BERTopic architecture. We have prepared a new dataset using tweets from customers of Nigerian banks and we use this to compare the topic modelling approaches. Our findings showed KernelPCA and K-means in the BERTopic architecture-produced coherent topics with a coherence score of 0.8463.
- [838] arXiv:2402.03183 (cross-list from cs.SE) [ pdf , ps , other ]
-
Title: Predicting Configuration Performance in Multiple Environments with Sequential Meta-learningComments: This paper has been accepted by FSE'24Subjects: Software Engineering (cs.SE) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Performance (cs.PF)
Abstract: Learning and predicting the performance of given software configurations are of high importance to many software engineering activities. While configurable software systems will almost certainly face diverse running environments (e.g., version, hardware, and workload), current work often either builds performance models under a single environment or fails to properly handle data from diverse settings, hence restricting their accuracy for new environments. In this paper, we target configuration performance learning under multiple environments. We do so by designing SeMPL - a meta-learning framework that learns the common understanding from configurations measured in distinct (meta) environments and generalizes them to the unforeseen, target environment. What makes it unique is that unlike common meta-learning frameworks (e.g., MAML and MetaSGD) that train the meta environments in parallel, we train them sequentially, one at a time. The order of training naturally allows discriminating the contributions among meta environments in the meta-model built, which fits better with the characteristic of configuration data that is known to dramatically differ between different environments. Through comparing with 15 state-of-the-art models under nine systems, our extensive experimental results demonstrate that SeMPL performs considerably better on 89% of the systems with up to 99% accuracy improvement, while being data-efficient, leading to a maximum of 3.86x speedup. All code and data can be found at our repository: this https URL .
- [839] arXiv:2402.03190 (cross-list from cs.CL) [ pdf , ps , other ]
-
Title: Unified Hallucination Detection for Multimodal Large Language ModelsXiang Chen , Chenxi Wang , Yida Xue , Ningyu Zhang , Xiaoyan Yang , Qiang Li , Yue Shen , Lei Liang , Jinjie Gu , Huajun ChenComments: Work in progressSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI); Information Retrieval (cs.IR); Machine Learning (cs.LG); Multimedia (cs.MM)
Abstract: Despite significant strides in multimodal tasks, Multimodal Large Language Models (MLLMs) are plagued by the critical issue of hallucination. The reliable detection of such hallucinations in MLLMs has, therefore, become a vital aspect of model evaluation and the safeguarding of practical application deployment. Prior research in this domain has been constrained by a narrow focus on singular tasks, an inadequate range of hallucination categories addressed, and a lack of detailed granularity. In response to these challenges, our work expands the investigative horizons of hallucination detection. We present a novel meta-evaluation benchmark, MHaluBench, meticulously crafted to facilitate the evaluation of advancements in hallucination detection methods. Additionally, we unveil a novel unified multimodal hallucination detection framework, UNIHD, which leverages a suite of auxiliary tools to validate the occurrence of hallucinations robustly. We demonstrate the effectiveness of UNIHD through meticulous evaluation and comprehensive analysis. We also provide strategic insights on the application of specific tools for addressing various categories of hallucinations.
- [840] arXiv:2402.03204 (cross-list from cs.IT) [ pdf , ps , other ]
-
Title: Multi-agent Reinforcement Learning for Energy Saving in Multi-Cell Massive MIMO SystemsSubjects: Information Theory (cs.IT) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: We develop a multi-agent reinforcement learning (MARL) algorithm to minimize the total energy consumption of multiple massive MIMO (multiple-input multiple-output) base stations (BSs) in a multi-cell network while preserving the overall quality-of-service (QoS) by making decisions on the multi-level advanced sleep modes (ASMs) and antenna switching of these BSs. The problem is modeled as a decentralized partially observable Markov decision process (DEC-POMDP) to enable collaboration between individual BSs, which is necessary to tackle inter-cell interference. A multi-agent proximal policy optimization (MAPPO) algorithm is designed to learn a collaborative BS control policy. To enhance its scalability, a modified version called MAPPO-neighbor policy is further proposed. Simulation results demonstrate that the trained MAPPO agent achieves better performance compared to baseline policies. Specifically, compared to the auto sleep mode 1 (symbol-level sleeping) algorithm, the MAPPO-neighbor policy reduces power consumption by approximately 8.7% during low-traffic hours and improves energy efficiency by approximately 19% during high-traffic hours, respectively.
- [841] arXiv:2402.03214 (cross-list from cs.CV) [ pdf , ps , html , other ]
-
Title: Organic or Diffused: Can We Distinguish Human Art from AI-generated Images?Anna Yoo Jeong Ha , Josephine Passananti , Ronik Bhaskar , Shawn Shan , Reid Southen , Haitao Zheng , Ben Y. ZhaoSubjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: The advent of generative AI images has completely disrupted the art world. Distinguishing AI generated images from human art is a challenging problem whose impact is growing over time. A failure to address this problem allows bad actors to defraud individuals paying a premium for human art and companies whose stated policies forbid AI imagery. It is also critical for content owners to establish copyright, and for model trainers interested in curating training data in order to avoid potential model collapse.
There are several different approaches to distinguishing human art from AI images, including classifiers trained by supervised learning, research tools targeting diffusion models, and identification by professional artists using their knowledge of artistic techniques. In this paper, we seek to understand how well these approaches can perform against today's modern generative models in both benign and adversarial settings. We curate real human art across 7 styles, generate matching images from 5 generative models, and apply 8 detectors (5 automated detectors and 3 different human groups including 180 crowdworkers, 4000+ professional artists, and 13 expert artists experienced at detecting AI). Both Hive and expert artists do very well, but make mistakes in different ways (Hive is weaker against adversarial perturbations while Expert artists produce higher false positives). We believe these weaknesses will remain as models continue to evolve, and use our data to demonstrate why a combined team of human and automated detectors provides the best combination of accuracy and robustness. - [842] arXiv:2402.03216 (cross-list from cs.CL) [ pdf , ps , html , other ]
-
Title: BGE M3-Embedding: Multi-Lingual, Multi-Functionality, Multi-Granularity Text Embeddings Through Self-Knowledge DistillationComments: Work in progressSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: In this paper, we present a new embedding model, called M3-Embedding, which is distinguished for its versatility in Multi-Linguality, Multi-Functionality, and Multi-Granularity. It can support more than 100 working languages, leading to new state-of-the-art performances on multi-lingual and cross-lingual retrieval tasks. It can simultaneously perform the three common retrieval functionalities of embedding model: dense retrieval, multi-vector retrieval, and sparse retrieval, which provides a unified model foundation for real-world IR applications. It is able to process inputs of different granularities, spanning from short sentences to long documents of up to 8192 tokens. The effective training of M3-Embedding involves the following technical contributions. We propose a novel self-knowledge distillation approach, where the relevance scores from different retrieval functionalities can be integrated as the teacher signal to enhance the training quality. We also optimize the batching strategy, enabling a large batch size and high training throughput to ensure the discriminativeness of embeddings. To the best of our knowledge, M3-Embedding is the first embedding model which realizes such a strong versatility. The model and code will be publicly available at this https URL .
- [843] arXiv:2402.03227 (cross-list from cs.CV) [ pdf , ps , html , other ]
-
Title: IGUANe: a 3D generalizable CycleGAN for multicenter harmonization of brain MR imagesComments: 23 pages, 8 figures; typos correctedSubjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: In MRI studies, the aggregation of imaging data from multiple acquisition sites enhances sample size but may introduce site-related variabilities that hinder consistency in subsequent analyses. Deep learning methods for image translation have emerged as a solution for harmonizing MR images across sites. In this study, we introduce IGUANe (Image Generation with Unified Adversarial Networks), an original 3D model that leverages the strengths of domain translation and straightforward application of style transfer methods for multicenter brain MR image harmonization. IGUANe extends CycleGAN architecture by integrating an arbitrary number of domains for training through a many-to-one strategy. During inference, the model can be applied to any image, even from an unknown acquisition site, making it a universal generator for harmonization. Trained on a dataset comprising T1-weighted images from 11 different scanners, IGUANe was evaluated on data from unseen sites. The assessments included the transformation of MR images with traveling subjects, the preservation of pairwise distances between MR images within domains, the evolution of volumetric patterns related to age and Alzheimer$^\prime$s disease (AD), and the performance in age regression and patient classification tasks. Comparisons with other harmonization and normalization methods suggest that IGUANe better preserves individual information in MR images and is more suitable for maintaining and reinforcing variabilities related to age and AD. Future studies may further assess IGUANe in other multicenter contexts, either using the same model or retraining it for applications to different image modalities.
- [844] arXiv:2402.03246 (cross-list from cs.CV) [ pdf , ps , html , other ]
-
Title: SGS-SLAM: Semantic Gaussian Splatting For Neural Dense SLAMSubjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI); Robotics (cs.RO)
Abstract: We present SGS-SLAM, the first semantic visual SLAM system based on Gaussian Splatting. It incorporates appearance, geometry, and semantic features through multi-channel optimization, addressing the oversmoothing limitations of neural implicit SLAM systems in high-quality rendering, scene understanding, and object-level geometry. We introduce a unique semantic feature loss that effectively compensates for the shortcomings of traditional depth and color losses in object optimization. Through a semantic-guided keyframe selection strategy, we prevent erroneous reconstructions caused by cumulative errors. Extensive experiments demonstrate that SGS-SLAM delivers state-of-the-art performance in camera pose estimation, map reconstruction, precise semantic segmentation, and object-level geometric accuracy, while ensuring real-time rendering capabilities.
- [845] arXiv:2402.03247 (cross-list from cs.AR) [ pdf , ps , other ]
-
Title: HEANA: A Hybrid Time-Amplitude Analog Optical Accelerator with Flexible Dataflows for Energy-Efficient CNN InferenceComments: The paper is under review at ACM TODAESSubjects: Hardware Architecture (cs.AR) ; Artificial Intelligence (cs.AI); Emerging Technologies (cs.ET)
Abstract: Several photonic microring resonators (MRRs) based analog accelerators have been proposed to accelerate the inference of integer-quantized CNNs with remarkably higher throughput and energy efficiency compared to their electronic counterparts. However, the existing analog photonic accelerators suffer from three shortcomings: (i) severe hampering of wavelength parallelism due to various crosstalk effects, (ii) inflexibility of supporting various dataflows other than the weight-stationary dataflow, and (iii) failure in fully leveraging the ability of photodetectors to perform in-situ accumulations. These shortcomings collectively hamper the performance and energy efficiency of prior accelerators. To tackle these shortcomings, we present a novel Hybrid timE Amplitude aNalog optical Accelerator, called HEANA. HEANA employs hybrid time-amplitude analog optical multipliers (TAOMs) that increase the flexibility of HEANA to support multiple dataflows. A spectrally hitless arrangement of TAOMs significantly reduces the crosstalk effects, thereby increasing the wavelength parallelism in HEANA. Moreover, HEANA employs our invented balanced photo-charge accumulators (BPCAs) that enable buffer-less, in-situ, temporal accumulations to eliminate the need to use reduction networks in HEANA, relieving it from related latency and energy overheads. Our evaluation for the inference of four modern CNNs indicates that HEANA provides improvements of atleast 66x and 84x in frames-per-second (FPS) and FPS/W (energy-efficiency), respectively, for equal-area comparisons, on gmean over two MRR-based analog CNN accelerators from prior work.
- [846] arXiv:2402.03251 (cross-list from cs.CV) [ pdf , ps , other ]
-
Title: CLIP Can Understand DepthSubjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: Recent studies on generalizing CLIP for monocular depth estimation reveal that CLIP pre-trained on web-crawled data is inefficient for deriving proper similarities between image patches and depth-related prompts. In this paper, we adapt CLIP for meaningful quality of monocular depth estimation with dense prediction, without fine-tuning its original vision-language alignment. By jointly training a compact deconvolutional decoder with a tiny learnable embedding matrix named mirror, as a static prompt for its text encoder, CLIP is enabled to understand depth. With this approach, our model exhibits impressive performance matching several previous state-of-the-art vision-only models on the NYU Depth v2 and KITTI datasets, outperforming every CLIP-based depth estimation model with a large margin. Experiments on temporal depth consistency and spatial continuity demonstrate that the prior knowledge of CLIP can be effectively refined by our proposed framework. Furthermore, an ablation study on mirror proves that the resulting model estimates depth utilizing knowledge not only from the image encoder but also text encoder despite not being given any prompt written in a human way. This research demonstrates that through minimal adjustments, the prior knowledge of vision-language foundation models, such as CLIP, can be generalized even to domains where learning during pretraining is challenging. We facilitate future works focused on methods to adjust suboptimal prior knowledge of vision-language models using non-human language prompts, achieving performance on par with task-specific state-of-the-art methodologies.
- [847] arXiv:2402.03268 (cross-list from cs.LG) [ pdf , ps , html , other ]
-
Title: Understanding the Reasoning Ability of Language Models From the Perspective of Reasoning Paths AggregationSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
Abstract: Pre-trained language models (LMs) are able to perform complex reasoning without explicit fine-tuning. To understand how pre-training with a next-token prediction objective contributes to the emergence of such reasoning capability, we propose that we can view an LM as deriving new conclusions by aggregating indirect reasoning paths seen at pre-training time. We found this perspective effective in two important cases of reasoning: logic reasoning with knowledge graphs (KGs) and math reasoning with math word problems (MWPs). More specifically, we formalize the reasoning paths as random walk paths on the knowledge/reasoning graphs. Analyses of learned LM distributions suggest that a weighted sum of relevant random walk path probabilities is a reasonable way to explain how LMs reason. Experiments and analysis on multiple KG and MWP datasets reveal the effect of training on random walk paths and suggest that augmenting unlabeled random walk reasoning paths can improve real-world multi-step reasoning performance. code: this https URL
- [848] arXiv:2402.03271 (cross-list from cs.CL) [ pdf , ps , other ]
-
Title: Uncertainty of Thoughts: Uncertainty-Aware Planning Enhances Information Seeking in Large Language ModelsZhiyuan Hu , Chumin Liu , Xidong Feng , Yilun Zhao , See-Kiong Ng , Anh Tuan Luu , Junxian He , Pang Wei Koh , Bryan HooiComments: Under reviewSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: In the face of uncertainty, the ability to seek information is of fundamental importance. In many practical applications, such as medical diagnosis and troubleshooting, the information needed to solve the task is not initially given, and has to be actively sought by asking follow-up questions (for example, a doctor asking a patient for more details about their symptoms). In this work, we introduce Uncertainty of Thoughts (UoT), an algorithm to augment large language models with the ability to actively seek information by asking effective questions. UoT combines 1) an uncertainty-aware simulation approach which enables the model to simulate possible future scenarios and how likely they are to occur, 2) uncertainty-based rewards motivated by information gain which incentivizes the model to seek information, and 3) a reward propagation scheme to select the optimal question to ask in a way that maximizes the expected reward. In experiments on medical diagnosis, troubleshooting and the '20 Questions' game, UoT achieves an average performance improvement of 57.8% in the rate of successful task completion across multiple LLMs compared with direct prompting, and also improves efficiency (i.e., the number of questions needed to complete the task).
- [849] arXiv:2402.03282 (cross-list from cs.LG) [ pdf , ps , other ]
-
Title: A Framework for Partially Observed Reward-States in RLHFComments: 47 pages. 13 pages for the main paper, 34 pages for the references and appendixSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
Abstract: The study of reinforcement learning from human feedback (RLHF) has gained prominence in recent years due to its role in the development of LLMs. Neuroscience research shows that human responses to stimuli are known to depend on partially-observed "internal states." Unfortunately current models of RLHF do not take take this into consideration. Moreover most RLHF models do not account for intermediate feedback, which is gaining importance in empirical work and can help improve both sample complexity and alignment. To address these limitations, we model RLHF as reinforcement learning with partially observed reward-states (PORRL). We show reductions from the the two dominant forms of human feedback in RLHF - cardinal and dueling feedback to PORRL. For cardinal feedback, we develop generic statistically efficient algorithms and instantiate them to present POR-UCRL and POR-UCBVI. For dueling feedback, we show that a naive reduction to cardinal feedback fails to achieve sublinear dueling regret. We then present the first explicit reduction that converts guarantees for cardinal regret to dueling regret. We show that our models and guarantees in both settings generalize and extend existing ones. Finally, we identify a recursive structure on our model that could improve the statistical and computational tractability of PORRL, giving examples from past work on RLHF as well as learning perfect reward machines, which PORRL subsumes.
- [850] arXiv:2402.03284 (cross-list from cs.CL) [ pdf , ps , other ]
-
Title: Deal, or no deal (or who knows)? Forecasting Uncertainty in Conversations using Large Language ModelsComments: 2 Figures; 7 Tables; 27 pagesSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: Effective interlocutors account for the uncertain goals, beliefs, and emotions of others. But even the best human conversationalist cannot perfectly anticipate the trajectory of a dialogue. How well can language models represent inherent uncertainty in conversations? We propose FortUne Dial, an expansion of the long-standing "conversation forecasting" task: instead of just accuracy, evaluation is conducted with uncertainty-aware metrics, effectively enabling abstention on individual instances. We study two ways in which language models potentially represent outcome uncertainty (internally, using scores and directly, using tokens) and propose fine-tuning strategies to improve calibration of both representations. Experiments on eight difficult negotiation corpora demonstrate that our proposed fine-tuning strategies (a traditional supervision strategy and an off-policy reinforcement learning strategy) can calibrate smaller open-source models to compete with pre-trained models 10x their size.
- [851] arXiv:2402.03286 (cross-list from cs.CV) [ pdf , ps , other ]
-
Title: Training-Free Consistent Text-to-Image GenerationComments: Project page is in this https URLSubjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI); Graphics (cs.GR); Machine Learning (cs.LG)
Abstract: Text-to-image models offer a new level of creative flexibility by allowing users to guide the image generation process through natural language. However, using these models to consistently portray the same subject across diverse prompts remains challenging. Existing approaches fine-tune the model to teach it new words that describe specific user-provided subjects or add image conditioning to the model. These methods require lengthy per-subject optimization or large-scale pre-training. Moreover, they struggle to align generated images with text prompts and face difficulties in portraying multiple subjects. Here, we present ConsiStory, a training-free approach that enables consistent subject generation by sharing the internal activations of the pretrained model. We introduce a subject-driven shared attention block and correspondence-based feature injection to promote subject consistency between images. Additionally, we develop strategies to encourage layout diversity while maintaining subject consistency. We compare ConsiStory to a range of baselines, and demonstrate state-of-the-art performance on subject consistency and text alignment, without requiring a single optimization step. Finally, ConsiStory can naturally extend to multi-subject scenarios, and even enable training-free personalization for common objects.
- [852] arXiv:2402.03289 (cross-list from cs.LG) [ pdf , ps , other ]
-
Title: Make Every Move Count: LLM-based High-Quality RTL Code Generation Using MCTSMatthew DeLorenzo , Animesh Basak Chowdhury , Vasudev Gohil , Shailja Thakur , Ramesh Karri , Siddharth Garg , Jeyavijayan RajendranSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Hardware Architecture (cs.AR)
Abstract: Existing large language models (LLMs) for register transfer level code generation face challenges like compilation failures and suboptimal power, performance, and area (PPA) efficiency. This is due to the lack of PPA awareness in conventional transformer decoding algorithms. In response, we present an automated transformer decoding algorithm that integrates Monte Carlo tree-search for lookahead, guiding the transformer to produce compilable, functionally correct, and PPA-optimized code. Empirical evaluation with a fine-tuned language model on RTL codesets shows that our proposed technique consistently generates functionally correct code compared to prompting-only methods and effectively addresses the PPA-unawareness drawback of naive large language models. For the largest design generated by the state-of-the-art LLM (16-bit adder), our technique can achieve a 31.8% improvement in the area-delay product.
- [853] arXiv:2402.03290 (cross-list from cs.CV) [ pdf , ps , other ]
-
Title: InstanceDiffusion: Instance-level Control for Image GenerationComments: Preprint; Project page: this https URLSubjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: Text-to-image diffusion models produce high quality images but do not offer control over individual instances in the image. We introduce InstanceDiffusion that adds precise instance-level control to text-to-image diffusion models. InstanceDiffusion supports free-form language conditions per instance and allows flexible ways to specify instance locations such as simple single points, scribbles, bounding boxes or intricate instance segmentation masks, and combinations thereof. We propose three major changes to text-to-image models that enable precise instance-level control. Our UniFusion block enables instance-level conditions for text-to-image models, the ScaleU block improves image fidelity, and our Multi-instance Sampler improves generations for multiple instances. InstanceDiffusion significantly surpasses specialized state-of-the-art models for each location condition. Notably, on the COCO dataset, we outperform previous state-of-the-art by 20.4% AP$_{50}^\text{box}$ for box inputs, and 25.4% IoU for mask inputs.
- [854] arXiv:2402.03293 (cross-list from cs.LG) [ pdf , ps , other ]
-
Title: Flora: Low-Rank Adapters Are Secretly Gradient CompressorsSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
Abstract: Despite large neural networks demonstrating remarkable abilities to complete different tasks, they require excessive memory usage to store the optimization states for training. To alleviate this, the low-rank adaptation (LoRA) is proposed to reduce the optimization states by training fewer parameters. However, LoRA restricts overall weight update matrices to be low-rank, limiting the model performance. In this work, we investigate the dynamics of LoRA and identify that it can be approximated by a random projection. Based on this observation, we propose Flora, which is able to achieve high-rank updates by resampling the projection matrices while enjoying the sublinear space complexity of optimization states. We conduct experiments across different tasks and model architectures to verify the effectiveness of our approach.
- [855] arXiv:2402.03295 (cross-list from cs.LG) [ pdf , ps , other ]
-
Title: Ginger: An Efficient Curvature Approximation with Linear Complexity for General Neural NetworksSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Optimization and Control (math.OC); Machine Learning (stat.ML)
Abstract: Second-order optimization approaches like the generalized Gauss-Newton method are considered more powerful as they utilize the curvature information of the objective function with preconditioning matrices. Albeit offering tempting theoretical benefits, they are not easily applicable to modern deep learning. The major reason is due to the quadratic memory and cubic time complexity to compute the inverse of the matrix. These requirements are infeasible even with state-of-the-art hardware. In this work, we propose Ginger, an eigendecomposition for the inverse of the generalized Gauss-Newton matrix. Our method enjoys efficient linear memory and time complexity for each iteration. Instead of approximating the conditioning matrix, we directly maintain its inverse to make the approximation more accurate. We provide the convergence result of Ginger for non-convex objectives. Our experiments on different tasks with different model architectures verify the effectiveness of our method. Our code is publicly available.
- [856] arXiv:2402.03300 (cross-list from cs.CL) [ pdf , ps , html , other ]
-
Title: DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language ModelsZhihong Shao , Peiyi Wang , Qihao Zhu , Runxin Xu , Junxiao Song , Xiao Bi , Haowei Zhang , Mingchuan Zhang , Y.K. Li , Y. Wu , Daya GuoSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: Mathematical reasoning poses a significant challenge for language models due to its complex and structured nature. In this paper, we introduce DeepSeekMath 7B, which continues pre-training DeepSeek-Coder-Base-v1.5 7B with 120B math-related tokens sourced from Common Crawl, together with natural language and code data. DeepSeekMath 7B has achieved an impressive score of 51.7% on the competition-level MATH benchmark without relying on external toolkits and voting techniques, approaching the performance level of Gemini-Ultra and GPT-4. Self-consistency over 64 samples from DeepSeekMath 7B achieves 60.9% on MATH. The mathematical reasoning capability of DeepSeekMath is attributed to two key factors: First, we harness the significant potential of publicly available web data through a meticulously engineered data selection pipeline. Second, we introduce Group Relative Policy Optimization (GRPO), a variant of Proximal Policy Optimization (PPO), that enhances mathematical reasoning abilities while concurrently optimizing the memory usage of PPO.
- [857] arXiv:2402.03303 (cross-list from cs.CL) [ pdf , ps , other ]
-
Title: Nevermind: Instruction Override and Moderation in Large Language ModelsComments: 11 pagesSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: Given the impressive capabilities of recent Large Language Models (LLMs), we investigate and benchmark the most popular proprietary and different sized open source models on the task of explicit instruction following in conflicting situations, e.g. overrides. These include the ability of the model to override the knowledge within the weights of the model, the ability to override (or moderate) extracted knowledge in the prompt, and lastly the ability to perform a full jailbreak. Experimentation performed suggest several key findings to improve instruction following - larger models perform the best in following instructions that override internal and contextual instructions, and are obedient, even to a fault. When scaling to longer contexts via rope scaling, a significant buffer needs to be maintained from the edge of the perplexity cliff in order to maintain instruction following capabilities. Finally, we observe improving instruction following, and subsequently instruction overrides/jailbreaks, is fundamentally at odds with the ability of a language model to follow given safety filters or guidelines. Thus, we postulate the most effective approach for safe, trustworthy AI should be dealt external to the LLM itself.
- [858] arXiv:2402.03305 (cross-list from cs.LG) [ pdf , ps , html , other ]
-
Title: Do Diffusion Models Learn Semantically Meaningful and Efficient Representations?Comments: 13 pages, 9 figuresSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
Abstract: Diffusion models are capable of impressive feats of image generation with uncommon juxtapositions such as astronauts riding horses on the moon with properly placed shadows. These outputs indicate the ability to perform compositional generalization, but how do the models do so? We perform controlled experiments on conditional DDPMs learning to generate 2D spherical Gaussian bumps centered at specified $x$- and $y$-positions. Our results show that the emergence of semantically meaningful latent representations is key to achieving high performance. En route to successful performance over learning, the model traverses three distinct phases of latent representations: (phase A) no latent structure, (phase B) a 2D manifold of disordered states, and (phase C) a 2D ordered manifold. Corresponding to each of these phases, we identify qualitatively different generation behaviors: 1) multiple bumps are generated, 2) one bump is generated but at inaccurate $x$ and $y$ locations, 3) a bump is generated at the correct $x$ and y location. Furthermore, we show that even under imbalanced datasets where features ($x$- versus $y$-positions) are represented with skewed frequencies, the learning process for $x$ and $y$ is coupled rather than factorized, demonstrating that simple vanilla-flavored diffusion models cannot learn efficient representations in which localization in $x$ and $y$ are factorized into separate 1D tasks. These findings suggest the need for future work to find inductive biases that will push generative models to discover and exploit factorizable independent structures in their inputs, which will be required to vault these models into more data-efficient regimes.
- [859] arXiv:2402.03311 (cross-list from cs.CV) [ pdf , ps , other ]
-
Title: HASSOD: Hierarchical Adaptive Self-Supervised Object DetectionComments: NeurIPS 2023Subjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: The human visual perception system demonstrates exceptional capabilities in learning without explicit supervision and understanding the part-to-whole composition of objects. Drawing inspiration from these two abilities, we propose Hierarchical Adaptive Self-Supervised Object Detection (HASSOD), a novel approach that learns to detect objects and understand their compositions without human supervision. HASSOD employs a hierarchical adaptive clustering strategy to group regions into object masks based on self-supervised visual representations, adaptively determining the number of objects per image. Furthermore, HASSOD identifies the hierarchical levels of objects in terms of composition, by analyzing coverage relations between masks and constructing tree structures. This additional self-supervised learning task leads to improved detection performance and enhanced interpretability. Lastly, we abandon the inefficient multi-round self-training process utilized in prior methods and instead adapt the Mean Teacher framework from semi-supervised learning, which leads to a smoother and more efficient training process. Through extensive experiments on prevalent image datasets, we demonstrate the superiority of HASSOD over existing methods, thereby advancing the state of the art in self-supervised object detection. Notably, we improve Mask AR from 20.2 to 22.5 on LVIS, and from 17.0 to 26.0 on SA-1B. Project page: this https URL .
- [860] arXiv:2402.03316 (cross-list from q-bio.NC) [ pdf , ps , other ]
-
Title: Artificial Intelligence for EEG Prediction: Applied Chaos TheoryComments: 70 pages, 14 figures, project reportSubjects: Neurons and Cognition (q-bio.NC) ; Artificial Intelligence (cs.AI); Dynamical Systems (math.DS); Chaotic Dynamics (nlin.CD)
Abstract: In the present research, we delve into the intricate realm of electroencephalogram (EEG) data analysis, focusing on sequence-to-sequence prediction of data across 32 EEG channels. The study harmoniously fuses the principles of applied chaos theory and dynamical systems theory to engender a novel feature set, enriching the representational capacity of our deep learning model. The endeavour's cornerstone is a transformer-based sequence-to-sequence architecture, calibrated meticulously to capture the non-linear and high-dimensional temporal dependencies inherent in EEG sequences. Through judicious architecture design, parameter initialisation strategies, and optimisation techniques, we have navigated the intricate balance between computational expediency and predictive performance. Our model stands as a vanguard in EEG data sequence prediction, demonstrating remarkable generalisability and robustness. The findings not only extend our understanding of EEG data dynamics but also unveil a potent analytical framework that can be adapted to diverse temporal sequence prediction tasks in neuroscience and beyond.
- [861] arXiv:2402.03319 (cross-list from cs.NE) [ pdf , ps , html , other ]
-
Title: Physical Reservoir Computing Enabled by Solitary Waves and Biologically-Inspired Nonlinear Transformation of Input DataComments: The Supplementary Video can be found here: this https URLSubjects: Neural and Evolutionary Computing (cs.NE) ; Artificial Intelligence (cs.AI); Chaotic Dynamics (nlin.CD); Pattern Formation and Solitons (nlin.PS); Fluid Dynamics (physics.flu-dyn)
Abstract: Reservoir computing (RC) systems can efficiently forecast chaotic time series using nonlinear dynamical properties of an artificial neural network of random connections. The versatility of RC systems has motivated further research on both hardware counterparts of traditional RC algorithms and more efficient RC-like schemes. Inspired by the nonlinear processes in a living biological brain and using solitary waves excited on the surface of a flowing liquid film, in this paper we experimentally validate a physical RC system that substitutes the effect of randomness for a nonlinear transformation of input data. Carrying out all operations using a microcontroller with a minimal computational power, we demonstrate that the so-designed RC system serves as a technically simple hardware counterpart to the `next-generation' improvement of the traditional RC algorithm.
- [862] arXiv:2402.03327 (cross-list from cs.CV) [ pdf , ps , html , other ]
-
Title: Uni3D-LLM: Unifying Point Cloud Perception, Generation and Editing with Large Language ModelsDingning Liu , Xiaoshui Huang , Yuenan Hou , Zhihui Wang , Zhenfei Yin , Yongshun Gong , Peng Gao , Wanli OuyangComments: 10 pages, 6 figuresSubjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
Abstract: In this paper, we introduce Uni3D-LLM, a unified framework that leverages a Large Language Model (LLM) to integrate tasks of 3D perception, generation, and editing within point cloud scenes. This framework empowers users to effortlessly generate and modify objects at specified locations within a scene, guided by the versatility of natural language descriptions. Uni3D-LLM harnesses the expressive power of natural language to allow for precise command over the generation and editing of 3D objects, thereby significantly enhancing operational flexibility and controllability. By mapping point cloud into the unified representation space, Uni3D-LLM achieves cross-application functionality, enabling the seamless execution of a wide array of tasks, ranging from the accurate instantiation of 3D objects to the diverse requirements of interactive design. Through a comprehensive suite of rigorous experiments, the efficacy of Uni3D-LLM in the comprehension, generation, and editing of point cloud has been validated. Additionally, we have assessed the impact of integrating a point cloud perception module on the generation and editing processes, confirming the substantial potential of our approach for practical applications.
- [863] arXiv:2402.03328 (cross-list from cs.CV) [ pdf , ps , other ]
-
Title: Visual Enumeration is Challenging for Large-scale Generative AISubjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI); Neural and Evolutionary Computing (cs.NE)
Abstract: Humans can readily judge the number of objects in a visual scene, even without counting, and such a skill has been documented in many animal species and babies prior to language development and formal schooling. Numerical judgments are error-free for small sets, while for larger collections responses become approximate, with variability increasing proportionally to the target number. This response pattern is observed for items of all kinds, despite variation in object features (such as color or shape), suggesting that our visual number sense relies on abstract representations of numerosity. Here, we investigate whether large-scale generative Artificial Intelligence (AI) systems have a human-like number sense, which should allow them to reliably name the number of objects in simple visual stimuli or generate images containing a target number of items in the 1-10 range. Surprisingly, most of the foundation models considered have a poor number sense: They make striking errors even with small numbers, the response variability does not increase in a systematic way, and the pattern of errors depends on object category. Only the most recent proprietary systems exhibit signatures of a visual number sense. Our findings demonstrate that having an intuitive visual understanding of number remains challenging for foundation models, which in turn might be detrimental to the perceptual grounding of numeracy that in humans is crucial for mathematical learning.
- [864] arXiv:2402.03329 (cross-list from cs.CV) [ pdf , ps , html , other ]
-
Title: Unsupervised Salient Patch Selection for Data-Efficient Reinforcement LearningSubjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI)
Abstract: To improve the sample efficiency of vision-based deep reinforcement learning (RL), we propose a novel method, called SPIRL, to automatically extract important patches from input images. Following Masked Auto-Encoders, SPIRL is based on Vision Transformer models pre-trained in a self-supervised fashion to reconstruct images from randomly-sampled patches. These pre-trained models can then be exploited to detect and select salient patches, defined as hard to reconstruct from neighboring patches. In RL, the SPIRL agent processes selected salient patches via an attention module. We empirically validate SPIRL on Atari games to test its data-efficiency against relevant state-of-the-art methods, including some traditional model-based methods and keypoint-based models. In addition, we analyze our model's interpretability capabilities.
- [865] arXiv:2402.03337 (cross-list from cs.RO) [ pdf , ps , html , other ]
-
Title: Reinforcement-learning robotic sailboats: simulator and preliminary resultsEduardo Charles Vasconcellos (UFF), Ronald M Sampaio , André P D Araújo (UFF), Esteban Walter Gonzales Clua , Philippe Preux (SEQUEL, GRAppA - LIFL), Raphael Guerra , Luiz M G Gonçalves (UFRN), Luis Martí , Hernan Lira , Nayat Sanchez-PiJournal-ref: NeurIPS 2023 Workshop on Robot Learning Workshop: Pretraining, Fine-Tuning, and Generalization with Large Scale Models, Dec 2023, New Orelans, United StatesSubjects: Robotics (cs.RO) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: This work focuses on the main challenges and problems in developing a virtual oceanic environment reproducing real experiments using Unmanned Surface Vehicles (USV) digital twins. We introduce the key features for building virtual worlds, considering using Reinforcement Learning (RL) agents for autonomous navigation and control. With this in mind, the main problems concern the definition of the simulation equations (physics and mathematics), their effective implementation, and how to include strategies for simulated control and perception (sensors) to be used with RL. We present the modeling, implementation steps, and challenges required to create a functional digital twin based on a real robotic sailing vessel. The application is immediate for developing navigation algorithms based on RL to be applied on real boats.
- [866] arXiv:2402.03347 (cross-list from cs.CV) [ pdf , ps , other ]
-
Title: Transfer Learning With Densenet201 Architecture Model For Potato Leaf Disease ClassificationSubjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI)
Abstract: Potato plants are plants that are beneficial to humans. Like other plants in general, potato plants also have diseases; if this disease is not treated immediately, there will be a significant decrease in food production. Therefore, it is necessary to detect diseases quickly and precisely so that disease control can be carried out effectively and efficiently. Classification of potato leaf disease can be done directly. Still, the symptoms cannot always explain the type of disease that attacks potato leaves because there are many types of diseases with symptoms that look the same. Humans also have deficiencies in determining the results of identification of potato leaf disease, so sometimes the results of identification between individuals can be different. Therefore, the use of Deep Learning for the classification process of potato leaf disease is expected to shorten the time and have a high classification accuracy. This study uses a deep learning method with the DenseNet201 architecture. The choice to use the DenseNet201 algorithm in this study is because the model can identify important features of potato leaves and recognize early signs of emerging diseases. This study aimed to evaluate the effectiveness of the transfer learning method with the DenseNet201 architecture in increasing the classification accuracy of potato leaf disease compared to traditional classification methods. This study uses two types of scenarios, namely, comparing the number of dropouts and comparing the three optimizers. This test produces the best model using dropout 0.1 and Adam optimizer with an accuracy of 99.5% for training, 95.2% for validation, and 96% for the confusion matrix. In this study, using data testing, as many as 40 images were tested into the model that has been built. The test results on this model resulted in a new accuracy for classifying potato leaf disease, namely 92.5%.
- [867] arXiv:2402.03348 (cross-list from cs.CV) [ pdf , ps , html , other ]
-
Title: Respect the model: Fine-grained and Robust Explanation with Sharing Ratio DecompositionComments: To be published in ICLR 2024Subjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI)
Abstract: The truthfulness of existing explanation methods in authentically elucidating the underlying model's decision-making process has been questioned. Existing methods have deviated from faithfully representing the model, thus susceptible to adversarial attacks. To address this, we propose a novel eXplainable AI (XAI) method called SRD (Sharing Ratio Decomposition), which sincerely reflects the model's inference process, resulting in significantly enhanced robustness in our explanations. Different from the conventional emphasis on the neuronal level, we adopt a vector perspective to consider the intricate nonlinear interactions between filters. We also introduce an interesting observation termed Activation-Pattern-Only Prediction (APOP), letting us emphasize the importance of inactive neurons and redefine relevance encapsulating all relevant information including both active and inactive neurons. Our method, SRD, allows for the recursive decomposition of a Pointwise Feature Vector (PFV), providing a high-resolution Effective Receptive Field (ERF) at any layer.
- [868] arXiv:2402.03349 (cross-list from physics.geo-ph) [ pdf , ps , html , other ]
-
Title: When Geoscience Meets Generative AI and Large Language Models: Foundations, Trends, and Future ChallengesSubjects: Geophysics (physics.geo-ph) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Atmospheric and Oceanic Physics (physics.ao-ph)
Abstract: Generative Artificial Intelligence (GAI) represents an emerging field that promises the creation of synthetic data and outputs in different modalities. GAI has recently shown impressive results across a large spectrum of applications ranging from biology, medicine, education, legislation, computer science, and finance. As one strives for enhanced safety, efficiency, and sustainability, generative AI indeed emerges as a key differentiator and promises a paradigm shift in the field. This paper explores the potential applications of generative AI and large language models in geoscience. The recent developments in the field of machine learning and deep learning have enabled the generative model's utility for tackling diverse prediction problems, simulation, and multi-criteria decision-making challenges related to geoscience and Earth system dynamics. This survey discusses several GAI models that have been used in geoscience comprising generative adversarial networks (GANs), physics-informed neural networks (PINNs), and generative pre-trained transformer (GPT)-based structures. These tools have helped the geoscience community in several applications, including (but not limited to) data generation/augmentation, super-resolution, panchromatic sharpening, haze removal, restoration, and land surface changing. Some challenges still remain such as ensuring physical interpretation, nefarious use cases, and trustworthiness. Beyond that, GAI models show promises to the geoscience community, especially with the support to climate change, urban science, atmospheric science, marine science, and planetary science through their extraordinary ability to data-driven modeling and uncertainty quantification.
- [869] arXiv:2402.03355 (cross-list from cs.SI) [ pdf , ps , other ]
-
Title: Unlocking Criminal Hierarchies: A Survey, Experimental, and Comparative Exploration of Techniques for Identifying Leaders within Criminal NetworksSubjects: Social and Information Networks (cs.SI) ; Artificial Intelligence (cs.AI)
Abstract: This survey paper offers a thorough analysis of techniques and algorithms used in the identification of crime leaders within criminal networks. For each technique, the paper examines its effectiveness, limitations, potential for improvement, and future prospects. The main challenge faced by existing survey papers focusing on algorithms for identifying crime leaders and predicting crimes is effectively categorizing these algorithms. To address this limitation, this paper proposes a new methodological taxonomy that hierarchically classifies algorithms into more detailed categories and specific techniques. The paper includes empirical and experimental evaluations to rank the different techniques. The combination of the methodological taxonomy, empirical evaluations, and experimental comparisons allows for a nuanced and comprehensive understanding of the techniques and algorithms for identifying crime leaders, assisting researchers in making informed decisions. Moreover, the paper offers valuable insights into the future prospects of techniques for identifying crime leaders, emphasizing potential advancements and opportunities for further research. Here's an overview of our empirical analysis findings and experimental insights, along with the solution we've devised: (1) PageRank and Eigenvector centrality are reliable for mapping network connections, (2) Katz Centrality can effectively identify influential criminals through indirect links, stressing their significance in criminal networks, (3) current models fail to account for the specific impacts of criminal influence levels, the importance of socio-economic context, and the dynamic nature of criminal networks and hierarchies, and (4) we propose enhancements, such as incorporating temporal dynamics and sentiment analysis to reflect the fluidity of criminal activities and relationships
- [870] arXiv:2402.03357 (cross-list from cs.SI) [ pdf , ps , html , other ]
-
Title: Harnessing Network Effect for Fake News Mitigation: Selecting Debunkers via Self-Imitation LearningComments: 10 pages, full version of this paper is accepted by AAAI'24Subjects: Social and Information Networks (cs.SI) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: This study aims to minimize the influence of fake news on social networks by deploying debunkers to propagate true news. This is framed as a reinforcement learning problem, where, at each stage, one user is selected to propagate true news. A challenging issue is episodic reward where the "net" effect of selecting individual debunkers cannot be discerned from the interleaving information propagation on social networks, and only the collective effect from mitigation efforts can be observed. Existing Self-Imitation Learning (SIL) methods have shown promise in learning from episodic rewards, but are ill-suited to the real-world application of fake news mitigation because of their poor sample efficiency. To learn a more effective debunker selection policy for fake news mitigation, this study proposes NAGASIL - Negative sampling and state Augmented Generative Adversarial Self-Imitation Learning, which consists of two improvements geared towards fake news mitigation: learning from negative samples, and an augmented state representation to capture the "real" environment state by integrating the current observed state with the previous state-action pairs from the same campaign. Experiments on two social networks show that NAGASIL yields superior performance to standard GASIL and state-of-the-art fake news mitigation models.
- [871] arXiv:2402.03358 (cross-list from cs.SI) [ pdf , ps , other ]
-
Title: A Comprehensive Survey on Graph Reduction: Sparsification, Coarsening, and CondensationComments: 16 pages, 3 tables, 2 figuresSubjects: Social and Information Networks (cs.SI) ; Artificial Intelligence (cs.AI); Data Structures and Algorithms (cs.DS); Machine Learning (cs.LG)
Abstract: Many real-world datasets can be naturally represented as graphs, spanning a wide range of domains. However, the increasing complexity and size of graph datasets present significant challenges for analysis and computation. In response, graph reduction techniques have gained prominence for simplifying large graphs while preserving essential properties. In this survey, we aim to provide a comprehensive understanding of graph reduction methods, including graph sparsification, graph coarsening, and graph condensation. Specifically, we establish a unified definition for these methods and introduce a hierarchical taxonomy to categorize the challenges they address. Our survey then systematically reviews the technical details of these methods and emphasizes their practical applications across diverse scenarios. Furthermore, we outline critical research directions to ensure the continued effectiveness of graph reduction techniques, as well as provide a comprehensive paper list at this https URL . We hope this survey will bridge literature gaps and propel the advancement of this promising field.
- [872] arXiv:2402.03362 (cross-list from cs.IR) [ pdf , ps , html , other ]
-
Title: NanoNER: Named Entity Recognition for nanobiology using experts' knowledge and distant supervisionSubjects: Information Retrieval (cs.IR) ; Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
Abstract: Here we present the training and evaluation of NanoNER, a Named Entity Recognition (NER) model for Nanobiology. NER consists in the identification of specific entities in spans of unstructured texts and is often a primary task in Natural Language Processing (NLP) and Information Extraction. The aim of our model is to recognise entities previously identified by domain experts as constituting the essential knowledge of the domain. Relying on ontologies, which provide us with a domain vocabulary and taxonomy, we implemented an iterative process enabling experts to determine the entities relevant to the domain at hand. We then delve into the potential of distant supervision learning in NER, supporting how this method can increase the quantity of annotated data with minimal additional manpower. On our full corpus of 728 full-text nanobiology articles, containing more than 120k entity occurrences, NanoNER obtained a F1-score of 0.98 on the recognition of previously known entities. Our model also demonstrated its ability to discover new entities in the text, with precision scores ranging from 0.77 to 0.81. Ablation experiments further confirmed this and allowed us to assess the dependency of our approach on the external resources. It highlighted the dependency of the approach to the resource, while also confirming its ability to rediscover up to 30% of the ablated terms. This paper details the methodology employed, experimental design, and key findings, providing valuable insights and directions for future related researches on NER in specialized domain. Furthermore, since our approach require minimal manpower , we believe that it can be generalized to other specialized fields.
- [873] arXiv:2402.03366 (cross-list from cs.IR) [ pdf , ps , html , other ]
-
Title: Uncertainty-Aware Explainable Recommendation with Large Language ModelsYicui Peng , Hao Chen , Chingsheng Lin , Guo Huang , Jinrong Hu , Hui Guo , Bin Kong , Shu Hu , Xi Wu , Xin WangSubjects: Information Retrieval (cs.IR) ; Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG)
Abstract: Providing explanations within the recommendation system would boost user satisfaction and foster trust, especially by elaborating on the reasons for selecting recommended items tailored to the user. The predominant approach in this domain revolves around generating text-based explanations, with a notable emphasis on applying large language models (LLMs). However, refining LLMs for explainable recommendations proves impractical due to time constraints and computing resource limitations. As an alternative, the current approach involves training the prompt rather than the LLM. In this study, we developed a model that utilizes the ID vectors of user and item inputs as prompts for GPT-2. We employed a joint training mechanism within a multi-task learning framework to optimize both the recommendation task and explanation task. This strategy enables a more effective exploration of users' interests, improving recommendation effectiveness and user satisfaction. Through the experiments, our method achieving 1.59 DIV, 0.57 USR and 0.41 FCR on the Yelp, TripAdvisor and Amazon dataset respectively, demonstrates superior performance over four SOTA methods in terms of explainability evaluation metric. In addition, we identified that the proposed model is able to ensure stable textual quality on the three public datasets.
- [874] arXiv:2402.03368 (cross-list from cs.IR) [ pdf , ps , other ]
-
Title: Empirical and Experimental Perspectives on Big Data in Recommendation Systems: A Comprehensive SurveySubjects: Information Retrieval (cs.IR) ; Artificial Intelligence (cs.AI)
Abstract: This survey paper provides a comprehensive analysis of big data algorithms in recommendation systems, addressing the lack of depth and precision in existing literature. It proposes a two-pronged approach: a thorough analysis of current algorithms and a novel, hierarchical taxonomy for precise categorization. The taxonomy is based on a tri-level hierarchy, starting with the methodology category and narrowing down to specific techniques. Such a framework allows for a structured and comprehensive classification of algorithms, assisting researchers in understanding the interrelationships among diverse algorithms and techniques. Covering a wide range of algorithms, this taxonomy first categorizes algorithms into four main analysis types: User and Item Similarity-Based Methods, Hybrid and Combined Approaches, Deep Learning and Algorithmic Methods, and Mathematical Modeling Methods, with further subdivisions into sub-categories and techniques. The paper incorporates both empirical and experimental evaluations to differentiate between the techniques. The empirical evaluation ranks the techniques based on four criteria. The experimental assessments rank the algorithms that belong to the same category, sub-category, technique, and sub-technique. Also, the paper illuminates the future prospects of big data techniques in recommendation systems, underscoring potential advancements and opportunities for further research in this field
- [875] arXiv:2402.03370 (cross-list from cs.IR) [ pdf , ps , html , other ]
-
Title: Detection of tortured phrases in scientific literatureJournal-ref: Proceedings of the 2nd Workshop on Information Extraction from Scientific Publications, Nov 2023, Bali, IndonesiaSubjects: Information Retrieval (cs.IR) ; Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Digital Libraries (cs.DL)
Abstract: This paper presents various automatic detection methods to extract so called tortured phrases from scientific papers. These tortured phrases, e.g. flag to clamor instead of signal to noise, are the results of paraphrasing tools used to escape plagiarism detection. We built a dataset and evaluated several strategies to flag previously undocumented tortured phrases. The proposed and tested methods are based on language models and either on embeddings similarities or on predictions of masked token. We found that an approach using token prediction and that propagates the scores to the chunk level gives the best results. With a recall value of .87 and a precision value of .61, it could retrieve new tortured phrases to be submitted to domain experts for validation.
- [876] arXiv:2402.03379 (cross-list from cs.IR) [ pdf , ps , html , other ]
-
Title: Entire Chain Uplift Modeling with Context-Enhanced Learning for Intelligent MarketingYinqiu Huang , Shuli Wang , Min Gao , Xue Wei , Changhao Li , Chuan Luo , Yinhua Zhu , Xiong Xiao , Yi LuoComments: Accepted by WWW2024Subjects: Information Retrieval (cs.IR) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: Uplift modeling, vital in online marketing, seeks to accurately measure the impact of various strategies, such as coupons or discounts, on different users by predicting the Individual Treatment Effect (ITE). In an e-commerce setting, user behavior follows a defined sequential chain, including impression, click, and conversion. Marketing strategies exert varied uplift effects at each stage within this chain, impacting metrics like click-through and conversion rate. Despite its utility, existing research has neglected to consider the inter-task across all stages impacts within a specific treatment and has insufficiently utilized the treatment information, potentially introducing substantial bias into subsequent marketing decisions. We identify these two issues as the chain-bias problem and the treatment-unadaptive problem. This paper introduces the Entire Chain UPlift method with context-enhanced learning (ECUP), devised to tackle these issues. ECUP consists of two primary components: 1) the Entire Chain-Enhanced Network, which utilizes user behavior patterns to estimate ITE throughout the entire chain space, models the various impacts of treatments on each task, and integrates task prior information to enhance context awareness across all stages, capturing the impact of treatment on different tasks, and 2) the Treatment-Enhanced Network, which facilitates fine-grained treatment modeling through bit-level feature interactions, thereby enabling adaptive feature adjustment. Extensive experiments on public and industrial datasets validate ECUPs effectiveness. Moreover, ECUP has been deployed on the Meituan food delivery platform, serving millions of daily active users, with the related dataset released for future research.
- [877] arXiv:2402.03384 (cross-list from cs.CV) [ pdf , ps , other ]
-
Title: Survival and grade of the glioma prediction using transfer learningSantiago Valbuena Rubio , María Teresa García-Ordás , Oscar García-Olalla Olivera , Héctor Alaiz-Moretón , Maria-Inmaculada González-Alonso , José Alberto Benítez-AndradesJournal-ref: PeerJ Computer Science, Volume 9, December 2023, ID e1723Subjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: Glioblastoma is a highly malignant brain tumor with a life expectancy of only 3 to 6 months without treatment. Detecting and predicting its survival and grade accurately are crucial. This study introduces a novel approach using transfer learning techniques. Various pre-trained networks, including EfficientNet, ResNet, VGG16, and Inception, were tested through exhaustive optimization to identify the most suitable architecture. Transfer learning was applied to fine-tune these models on a glioblastoma image dataset, aiming to achieve two objectives: survival and tumor grade prediction.The experimental results show 65% accuracy in survival prediction, classifying patients into short, medium, or long survival categories. Additionally, the prediction of tumor grade achieved an accuracy of 97%, accurately differentiating low-grade gliomas (LGG) and high-grade gliomas (HGG). The success of the approach is attributed to the effectiveness of transfer learning, surpassing the current state-of-the-art methods. In conclusion, this study presents a promising method for predicting the survival and grade of glioblastoma. Transfer learning demonstrates its potential in enhancing prediction models, particularly in scenarios with limited large datasets. These findings hold promise for improving diagnostic and treatment approaches for glioblastoma patients.
- [878] arXiv:2402.03386 (cross-list from cs.LG) [ pdf , ps , other ]
-
Title: A generalized decision tree ensemble based on the NeuralNetworks architecture: Distributed Gradient Boosting Forest (DGBF)Journal-ref: Applied Intelligence, Volume 53, July 2023, pages 22991-23003Subjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Abstract: Tree ensemble algorithms as RandomForest and GradientBoosting are currently the dominant methods for modeling discrete or tabular data, however, they are unable to perform a hierarchical representation learning from raw data as NeuralNetworks does thanks to its multi-layered structure, which is a key feature for DeepLearning problems and modeling unstructured data. This limitation is due to the fact that tree algorithms can not be trained with back-propagation because of their mathematical nature. However, in this work, we demonstrate that the mathematical formulation of bagging and boosting can be combined together to define a graph-structured-tree-ensemble algorithm with a distributed representation learning process between trees naturally (without using back-propagation). We call this novel approach Distributed Gradient Boosting Forest (DGBF) and we demonstrate that both RandomForest and GradientBoosting can be expressed as particular graph architectures of DGBT. Finally, we see that the distributed learning outperforms both RandomForest and GradientBoosting in 7 out of 9 datasets.
- [879] arXiv:2402.03390 (cross-list from eess.IV) [ pdf , ps , other ]
-
Title: PixelGen: Rethinking Embedded Camera SystemsSubjects: Image and Video Processing (eess.IV) ; Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Networking and Internet Architecture (cs.NI)
Abstract: Embedded camera systems are ubiquitous, representing the most widely deployed example of a wireless embedded system. They capture a representation of the world - the surroundings illuminated by visible or infrared light. Despite their widespread usage, the architecture of embedded camera systems has remained unchanged, which leads to limitations. They visualize only a tiny portion of the world. Additionally, they are energy-intensive, leading to limited battery lifespan. We present PixelGen, which re-imagines embedded camera systems. Specifically, PixelGen combines sensors, transceivers, and low-resolution image and infrared vision sensors to capture a broader world representation. They are deliberately chosen for their simplicity, low bitrate, and power consumption, culminating in an energy-efficient platform. We show that despite the simplicity, the captured data can be processed using transformer-based image and language models to generate novel representations of the environment. For example, we demonstrate that it can allow the generation of high-definition images, while the camera utilises low-power, low-resolution monochrome cameras. Furthermore, the capabilities of PixelGen extend beyond traditional photography, enabling visualization of phenomena invisible to conventional cameras, such as sound waves. PixelGen can enable numerous novel applications, and we demonstrate that it enables unique visualization of the surroundings that are then projected on extended reality headsets. We believe, PixelGen goes beyond conventional cameras and opens new avenues for research and photography.
- [880] arXiv:2402.03396 (cross-list from cs.SE) [ pdf , ps , html , other ]
-
Title: UniTSyn: A Large-Scale Dataset Capable of Enhancing the Prowess of Large Language Models for Program TestingComments: 8 pages, 5 figuresSubjects: Software Engineering (cs.SE) ; Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Cryptography and Security (cs.CR); Machine Learning (cs.LG)
Abstract: The remarkable capability of large language models (LLMs) in generating high-quality code has drawn increasing attention in the software testing community. However, existing code LLMs often demonstrate unsatisfactory capabilities in generating accurate and complete tests since they were trained on code snippets collected without differentiating between code for testing purposes and other code. In this paper, we present a large-scale dataset UniTSyn, which is capable of enhancing the prowess of LLMs for Unit Test Synthesis. Associating tests with the tested functions is crucial for LLMs to infer the expected behavior and the logic paths to be verified. By leveraging Language Server Protocol, UniTSyn achieves the challenging goal of collecting focal-test pairs without per-project execution setups or per-language heuristics that tend to be fragile and difficult to scale. It contains 2.7 million focal-test pairs across five mainstream programming languages, making it possible to be utilized for enhancing the test generation ability of LLMs. The details of UniTSyn can be found in Table 1. Our experiments demonstrate that, by building an autoregressive model based on UniTSyn, we can achieve significant benefits in learning and understanding unit test representations, resulting in improved generation accuracy and code coverage across all evaluated programming languages. Code and data will be publicly available.
- [881] arXiv:2402.03435 (cross-list from cs.CL) [ pdf , ps , html , other ]
-
Title: Psychological Assessments with Large Language Models: A Privacy-Focused and Cost-Effective ApproachComments: Accepted to the Workshop on Computational Linguistics and Clinical Psychology (CLPsych) at EACL 2024Subjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI); Computers and Society (cs.CY)
Abstract: This study explores the use of Large Language Models (LLMs) to analyze text comments from Reddit users, aiming to achieve two primary objectives: firstly, to pinpoint critical excerpts that support a predefined psychological assessment of suicidal risk; and secondly, to summarize the material to substantiate the preassigned suicidal risk level. The work is circumscribed to the use of "open-source" LLMs that can be run locally, thereby enhancing data privacy. Furthermore, it prioritizes models with low computational requirements, making it accessible to both individuals and institutions operating on limited computing budgets. The implemented strategy only relies on a carefully crafted prompt and a grammar to guide the LLM's text completion. Despite its simplicity, the evaluation metrics show outstanding results, making it a valuable privacy-focused and cost-effective approach. This work is part of the Computational Linguistics and Clinical Psychology (CLPsych) 2024 shared task.
- [882] arXiv:2402.03457 (cross-list from cs.LG) [ pdf , ps , html , other ]
-
Title: Efficient and Interpretable Traffic Destination Prediction using Explainable Boosting MachinesSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Abstract: Developing accurate models for traffic trajectory predictions is crucial for achieving fully autonomous driving. Various deep neural network models have been employed to address this challenge, but their black-box nature hinders transparency and debugging capabilities in a deployed system. Glass-box models offer a solution by providing full interpretability through methods like \ac{GAM}. In this study, we evaluate an efficient additive model called \ac{EBM} for traffic prediction on three popular mixed traffic datasets: \ac{SDD}, \ac{InD}, and Argoverse. Our results show that the \ac{EBM} models perform competitively in predicting pedestrian destinations within \ac{SDD} and \ac{InD} while providing modest predictions for vehicle-dominant Argoverse dataset. Additionally, our transparent trained models allow us to analyse feature importance and interactions, as well as provide qualitative examples of predictions explanation. The full training code will be made public upon publication.
- [883] arXiv:2402.03469 (cross-list from cs.LG) [ pdf , ps , html , other ]
-
Title: Rethinking the Role of Proxy Rewards in Language Model AlignmentComments: Under review; PreprintSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
Abstract: Learning from human feedback via proxy reward modeling has been studied to align Large Language Models (LLMs) with human values. However, achieving reliable training through that proxy reward model (RM) is not a trivial problem, and its behavior remained as a black-box. In this paper, we study the role of proxy rewards in the LLM alignment via `reverse reward engineering' by composing interpretable features as a white-box reward function. We aim to replicate the ground truth (gold) reward signal by achieving a monotonic relationship between the proxy and gold reward signals after training the model using the proxy reward in reinforcement learning (RL). Our findings indicate that successfully emulating the gold reward requires generating responses that are relevant with enough length to open-ended questions, while also ensuring response consistency in closed-ended questions. Furthermore, resulting models optimizing our devised white-box reward show competitive performances with strong open-source RMs in alignment benchmarks. We highlight its potential usage as a simple but strong reward baseline for the LLM alignment, not requiring explicit human feedback dataset and RM training. Our code is available at this https URL .
- [884] arXiv:2402.03471 (cross-list from cs.LG) [ pdf , ps , html , other ]
-
Title: The Information of Large Language Model GeometrySubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Information Theory (cs.IT)
Abstract: This paper investigates the information encoded in the embeddings of large language models (LLMs). We conduct simulations to analyze the representation entropy and discover a power law relationship with model sizes. Building upon this observation, we propose a theory based on (conditional) entropy to elucidate the scaling law phenomenon. Furthermore, we delve into the auto-regressive structure of LLMs and examine the relationship between the last token and previous context tokens using information theory and regression techniques. Specifically, we establish a theoretical connection between the information gain of new tokens and ridge regression. Additionally, we explore the effectiveness of Lasso regression in selecting meaningful tokens, which sometimes outperforms the closely related attention weights. Finally, we conduct controlled experiments, and find that information is distributed across tokens, rather than being concentrated in specific "meaningful" tokens alone.
- [885] arXiv:2402.03479 (cross-list from cs.LG) [ pdf , ps , html , other ]
-
Title: DRED: Zero-Shot Transfer in Reinforcement Learning via Data-Regularised Environment DesignComments: A preliminary version of this work ( arXiv:2310.03494 ) was presented at the ALOE workshop, NeurIPS 2023Subjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Abstract: Autonomous agents trained using deep reinforcement learning (RL) often lack the ability to successfully generalise to new environments, even when they share characteristics with the environments they have encountered during training. In this work, we investigate how the sampling of individual environment instances, or levels, affects the zero-shot generalisation (ZSG) ability of RL agents. We discover that, for deep actor-critic architectures sharing their base layers, prioritising levels according to their value loss minimises the mutual information between the agent's internal representation and the set of training levels in the generated training data. This provides a novel theoretical justification for the implicit regularisation achieved by certain adaptive sampling strategies. We then turn our attention to unsupervised environment design (UED) methods, which have more control over the data generation mechanism. We find that existing UED methods can significantly shift the training distribution, which translates to low ZSG performance. To prevent both overfitting and distributional shift, we introduce data-regularised environment design (DRED). DRED generates levels using a generative model trained over an initial set of level parameters, reducing distributional shift, and achieves significant improvements in ZSG over adaptive level sampling strategies and UED methods.
- [886] arXiv:2402.03480 (cross-list from cs.LG) [ pdf , ps , html , other ]
-
Title: Trillion Parameter AI Serving Infrastructure for Scientific Discovery: A Survey and VisionNathaniel Hudson , J. Gregory Pauloski , Matt Baughman , Alok Kamatar , Mansi Sakarvadia , Logan Ward , Ryan Chard , André Bauer , Maksim Levental , Wenyi Wang , Will Engler , Owen Price Skelly , Ben Blaiszik , Rick Stevens , Kyle Chard , Ian FosterComments: 10 pages, 3 figures, accepted for publication in the proceedings of the 10th IEEE/ACM International Conference on Big Data Computing, Applications and Technologies (BDCAT2023)Subjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Distributed, Parallel, and Cluster Computing (cs.DC)
Abstract: Deep learning methods are transforming research, enabling new techniques, and ultimately leading to new discoveries. As the demand for more capable AI models continues to grow, we are now entering an era of Trillion Parameter Models (TPM), or models with more than a trillion parameters -- such as Huawei's PanGu-$\Sigma$. We describe a vision for the ecosystem of TPM users and providers that caters to the specific needs of the scientific community. We then outline the significant technical challenges and open problems in system design for serving TPMs to enable scientific research and discovery. Specifically, we describe the requirements of a comprehensive software stack and interfaces to support the diverse and flexible requirements of researchers.
- [887] arXiv:2402.03483 (cross-list from cs.CL) [ pdf , ps , html , other ]
-
Title: SWAG: Storytelling With Action GuidanceSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI)
Abstract: Automated long-form story generation typically employs long-context large language models (LLMs) for one-shot creation, which can produce cohesive but not necessarily engaging content. We introduce Storytelling With Action Guidance (SWAG), a novel approach to storytelling with LLMs. Our approach reduces story writing to a search problem through a two-model feedback loop: one LLM generates story content, and another auxiliary LLM is used to choose the next best "action" to steer the story's future direction. Our results show that SWAG can substantially outperform previous end-to-end story generation techniques when evaluated by GPT-4 and through human evaluation, and our SWAG pipeline using only open-source models surpasses GPT-3.5-Turbo.
- [888] arXiv:2402.03486 (cross-list from cs.LG) [ pdf , ps , html , other ]
-
Title: Early prediction of onset of sepsis in Clinical SettingFahim Mohammad , Lakshmi Arunachalam , Samanway Sadhu , Boudewijn Aasman , Shweta Garg , Adil Ahmed , Silvie Colman , Meena Arunachalam , Sudhir Kulkarni , Parsa MirhajiComments: 16 pages, 6 figures and 7 tablesSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Information Retrieval (cs.IR)
Abstract: This study proposes the use of Machine Learning models to predict the early onset of sepsis using deidentified clinical data from Montefiore Medical Center in Bronx, NY, USA. A supervised learning approach was adopted, wherein an XGBoost model was trained utilizing 80\% of the train dataset, encompassing 107 features (including the original and derived features). Subsequently, the model was evaluated on the remaining 20\% of the test data. The model was validated on prospective data that was entirely unseen during the training phase. To assess the model's performance at the individual patient level and timeliness of the prediction, a normalized utility score was employed, a widely recognized scoring methodology for sepsis detection, as outlined in the PhysioNet Sepsis Challenge paper. Metrics such as F1 Score, Sensitivity, Specificity, and Flag Rate were also devised. The model achieved a normalized utility score of 0.494 on test data and 0.378 on prospective data at threshold 0.3. The F1 scores were 80.8\% and 67.1\% respectively for the test data and the prospective data for the same threshold, highlighting its potential to be integrated into clinical decision-making processes effectively. These results bear testament to the model's robust predictive capabilities and its potential to substantially impact clinical decision-making processes.
- [889] arXiv:2402.03500 (cross-list from quant-ph) [ pdf , ps , html , other ]
-
Title: Curriculum reinforcement learning for quantum architecture search under hardware errorsYash J. Patel , Akash Kundu , Mateusz Ostaszewski , Xavier Bonet-Monroig , Vedran Dunjko , Onur DanaciComments: 32 pages, 11 figures, 6 tables. Accepted at ICLR 2024Subjects: Quantum Physics (quant-ph) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: The key challenge in the noisy intermediate-scale quantum era is finding useful circuits compatible with current device limitations. Variational quantum algorithms (VQAs) offer a potential solution by fixing the circuit architecture and optimizing individual gate parameters in an external loop. However, parameter optimization can become intractable, and the overall performance of the algorithm depends heavily on the initially chosen circuit architecture. Several quantum architecture search (QAS) algorithms have been developed to design useful circuit architectures automatically. In the case of parameter optimization alone, noise effects have been observed to dramatically influence the performance of the optimizer and final outcomes, which is a key line of study. However, the effects of noise on the architecture search, which could be just as critical, are poorly understood. This work addresses this gap by introducing a curriculum-based reinforcement learning QAS (CRLQAS) algorithm designed to tackle challenges in realistic VQA deployment. The algorithm incorporates (i) a 3D architecture encoding and restrictions on environment dynamics to explore the search space of possible circuits efficiently, (ii) an episode halting scheme to steer the agent to find shorter circuits, and (iii) a novel variant of simultaneous perturbation stochastic approximation as an optimizer for faster convergence. To facilitate studies, we developed an optimized simulator for our algorithm, significantly improving computational efficiency in simulating noisy quantum circuits by employing the Pauli-transfer matrix formalism in the Pauli-Liouville basis. Numerical experiments focusing on quantum chemistry tasks demonstrate that CRLQAS outperforms existing QAS algorithms across several metrics in both noiseless and noisy environments.
- [890] arXiv:2402.03501 (cross-list from cs.CV) [ pdf , ps , html , other ]
-
Title: An Inpainting-Infused Pipeline for Attire and Background ReplacementFelipe Rodrigues Perche-Mahlow , André Felipe-Zanella , William Alberto Cruz-Castañeda , Marcellus AmadeusSubjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
Abstract: In recent years, groundbreaking advancements in Generative Artificial Intelligence (GenAI) have triggered a transformative paradigm shift, significantly influencing various domains. In this work, we specifically explore an integrated approach, leveraging advanced techniques in GenAI and computer vision emphasizing image manipulation. The methodology unfolds through several stages, including depth estimation, the creation of inpaint masks based on depth information, the generation and replacement of backgrounds utilizing Stable Diffusion in conjunction with Latent Consistency Models (LCMs), and the subsequent replacement of clothes and application of aesthetic changes through an inpainting pipeline. Experiments conducted in this study underscore the methodology's efficacy, highlighting its potential to produce visually captivating content. The convergence of these advanced techniques allows users to input photographs of individuals and manipulate them to modify clothing and background based on specific prompts without manually input inpainting masks, effectively placing the subjects within the vast landscape of creative imagination.
- [891] arXiv:2402.03509 (cross-list from cs.CL) [ pdf , ps , html , other ]
-
Title: Evaluating the Factuality of Zero-shot Summarizers Across Varied DomainsSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: Recent work has shown that large language models (LLMs) are capable of generating summaries zero-shot (i.e., without explicit supervision) that, under human assessment, are often comparable or even preferred to manually composed reference summaries. However, this prior work has focussed almost exclusively on evaluating news article summarization. How do zero-shot summarizers perform in other (potentially more specialized) domains? In this work we evaluate zero-shot generated summaries across specialized domains including biomedical articles, and legal bills (in addition to standard news benchmarks for reference). We focus especially on the factuality of outputs. We acquire annotations from domain experts to identify inconsistencies in summaries and systematically categorize these errors. We analyze whether the prevalence of a given domain in the pretraining corpus affects extractiveness and faithfulness of generated summaries of articles in this domain. We release all collected annotations to facilitate additional research toward measuring and realizing factually accurate summarization, beyond news articles. The dataset can be downloaded from this https URL
- [892] arXiv:2402.03519 (cross-list from cs.CL) [ pdf , ps , html , other ]
-
Title: Resolving Transcription Ambiguity in Spanish: A Hybrid Acoustic-Lexical System for Punctuation RestorationComments: Accepted to UnImplicit workshop at EACL 2024Subjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI)
Abstract: Punctuation restoration is a crucial step after Automatic Speech Recognition (ASR) systems to enhance transcript readability and facilitate subsequent NLP tasks. Nevertheless, conventional lexical-based approaches are inadequate for solving the punctuation restoration task in Spanish, where ambiguity can be often found between unpunctuated declaratives and questions. In this study, we propose a novel hybrid acoustic-lexical punctuation restoration system for Spanish transcription, which consolidates acoustic and lexical signals through a modular process. Our experiment results show that the proposed system can effectively improve F1 score of question marks and overall punctuation restoration on both public and internal Spanish conversational datasets. Additionally, benchmark comparison against LLMs (Large Language Model) indicates the superiority of our approach in accuracy, reliability and latency. Furthermore, we demonstrate that the Word Error Rate (WER) of the ASR module also benefits from our proposed system.
- [893] arXiv:2402.03525 (cross-list from cs.LG) [ pdf , ps , html , other ]
-
Title: Deep Reinforcement Learning for Picker Routing Problem in WarehousingSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Abstract: Order Picker Routing is a critical issue in Warehouse Operations Management. Due to the complexity of the problem and the need for quick solutions, suboptimal algorithms are frequently employed in practice. However, Reinforcement Learning offers an appealing alternative to traditional heuristics, potentially outperforming existing methods in terms of speed and accuracy. We introduce an attention based neural network for modeling picker tours, which is trained using Reinforcement Learning. Our method is evaluated against existing heuristics across a range of problem parameters to demonstrate its efficacy. A key advantage of our proposed method is its ability to offer an option to reduce the perceived complexity of routes.
- [894] arXiv:2402.03535 (cross-list from astro-ph.IM) [ pdf , ps , html , other ]
-
Title: Preliminary Report on Mantis Shrimp: a Multi-Survey Computer Vision Photometric Redshift ModelComments: 4 pages, 1 figure, 1 table. Submitted to AI4Differential Equations in Science Workshop at ICLR24. Public repository unavailable while under institutional reviewSubjects: Instrumentation and Methods for Astrophysics (astro-ph.IM) ; Artificial Intelligence (cs.AI)
Abstract: The availability of large, public, multi-modal astronomical datasets presents an opportunity to execute novel research that straddles the line between science of AI and science of astronomy. Photometric redshift estimation is a well-established subfield of astronomy. Prior works show that computer vision models typically outperform catalog-based models, but these models face additional complexities when incorporating images from more than one instrument or sensor. In this report, we detail our progress creating Mantis Shrimp, a multi-survey computer vision model for photometric redshift estimation that fuses ultra-violet (GALEX), optical (PanSTARRS), and infrared (UnWISE) imagery. We use deep learning interpretability diagnostics to measure how the model leverages information from the different inputs. We reason about the behavior of the CNNs from the interpretability metrics, specifically framing the result in terms of physically-grounded knowledge of galaxy properties.
- [895] arXiv:2402.03559 (cross-list from cs.LG) [ pdf , ps , html , other ]
-
Title: Projected Generative Diffusion Models for Constraint SatisfactionSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Abstract: Generative diffusion models excel at robustly synthesizing coherent content from raw noise through a sequential process. However, their direct application in scenarios requiring outputs to adhere to specific, stringent criteria faces several severe challenges. This paper aims at overcome these challenges and introduces Projected Generative Diffusion Models (PGDM), an approach that recast traditional diffusion models sampling into a constrained-optimization problem. This enables the application of an iterative projections method to ensure that generated data faithfully adheres to specified constraints or physical principles. This paper provides theoretical support for the ability of PGDM to synthesize outputs from a feasible subdistribution under a restricted class of constraints while also providing large empirical evidence in the case of complex non-convex constraints and ordinary differential equations. These capabilities are demonstrated by physics-informed motion in video generation, trajectory optimization in path planning, and morphometric properties adherence in material science.
- [896] arXiv:2402.03561 (cross-list from cs.CV) [ pdf , ps , other ]
-
Title: VLN-Video: Utilizing Driving Videos for Outdoor Vision-and-Language NavigationComments: AAAI 2024Subjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
Abstract: Outdoor Vision-and-Language Navigation (VLN) requires an agent to navigate through realistic 3D outdoor environments based on natural language instructions. The performance of existing VLN methods is limited by insufficient diversity in navigation environments and limited training data. To address these issues, we propose VLN-Video, which utilizes the diverse outdoor environments present in driving videos in multiple cities in the U.S. augmented with automatically generated navigation instructions and actions to improve outdoor VLN performance. VLN-Video combines the best of intuitive classical approaches and modern deep learning techniques, using template infilling to generate grounded navigation instructions, combined with an image rotation similarity-based navigation action predictor to obtain VLN style data from driving videos for pretraining deep learning VLN models. We pre-train the model on the Touchdown dataset and our video-augmented dataset created from driving videos with three proxy tasks: Masked Language Modeling, Instruction and Trajectory Matching, and Next Action Prediction, so as to learn temporally-aware and visually-aligned instruction representations. The learned instruction representation is adapted to the state-of-the-art navigator when fine-tuning on the Touchdown dataset. Empirical results demonstrate that VLN-Video significantly outperforms previous state-of-the-art models by 2.1% in task completion rate, achieving a new state-of-the-art on the Touchdown dataset.
- [897] arXiv:2402.03563 (cross-list from cs.LG) [ pdf , ps , html , other ]
-
Title: Distinguishing the Knowable from the Unknowable with Language ModelsSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
Abstract: We study the feasibility of identifying epistemic uncertainty (reflecting a lack of knowledge), as opposed to aleatoric uncertainty (reflecting entropy in the underlying distribution), in the outputs of large language models (LLMs) over free-form text. In the absence of ground-truth probabilities, we explore a setting where, in order to (approximately) disentangle a given LLM's uncertainty, a significantly larger model stands in as a proxy for the ground truth. We show that small linear probes trained on the embeddings of frozen, pretrained models accurately predict when larger models will be more confident at the token level and that probes trained on one text domain generalize to others. Going further, we propose a fully unsupervised method that achieves non-trivial accuracy on the same task. Taken together, we interpret these results as evidence that LLMs naturally contain internal representations of different types of uncertainty that could potentially be leveraged to devise more informative indicators of model confidence in diverse practical settings.
- [898] arXiv:2402.03570 (cross-list from cs.LG) [ pdf , ps , other ]
-
Title: Diffusion World ModelSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Abstract: We introduce Diffusion World Model (DWM), a conditional diffusion model capable of predicting multistep future states and rewards concurrently. As opposed to traditional one-step dynamics models, DWM offers long-horizon predictions in a single forward pass, eliminating the need for recursive queries. We integrate DWM into model-based value estimation, where the short-term return is simulated by future trajectories sampled from DWM. In the context of offline reinforcement learning, DWM can be viewed as a conservative value regularization through generative modeling. Alternatively, it can be seen as a data source that enables offline Q-learning with synthetic data. Our experiments on the D4RL dataset confirm the robustness of DWM to long-horizon simulation. In terms of absolute performance, DWM significantly surpasses one-step dynamics models with a $44\%$ performance gain, and achieves state-of-the-art performance.
- [899] arXiv:2402.03578 (cross-list from cs.MA) [ pdf , ps , html , other ]
-
Title: LLM Multi-Agent Systems: Challenges and Open ProblemsSubjects: Multiagent Systems (cs.MA) ; Artificial Intelligence (cs.AI)
Abstract: This paper explores existing works of multi-agent systems and identifies challenges that remain inadequately addressed. By leveraging the diverse capabilities and roles of individual agents within a multi-agent system, these systems can tackle complex tasks through collaboration. We discuss optimizing task allocation, fostering robust reasoning through iterative debates, managing complex and layered context information, and enhancing memory management to support the intricate interactions within multi-agent systems. We also explore the potential application of multi-agent systems in blockchain systems to shed light on their future development and application in real-world distributed systems.
- [900] arXiv:2402.03583 (cross-list from cs.SI) [ pdf , ps , other ]
-
Title: MQuinE: a cure for "Z-paradox" in knowledge graph embedding modelsComments: 18pages, 1 figureSubjects: Social and Information Networks (cs.SI) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: Knowledge graph embedding (KGE) models achieved state-of-the-art results on many knowledge graph tasks including link prediction and information retrieval. Despite the superior performance of KGE models in practice, we discover a deficiency in the expressiveness of some popular existing KGE models called \emph{Z-paradox}. Motivated by the existence of Z-paradox, we propose a new KGE model called \emph{MQuinE} that does not suffer from Z-paradox while preserves strong expressiveness to model various relation patterns including symmetric/asymmetric, inverse, 1-N/N-1/N-N, and composition relations with theoretical justification. Experiments on real-world knowledge bases indicate that Z-paradox indeed degrades the performance of existing KGE models, and can cause more than 20\% accuracy drop on some challenging test samples. Our experiments further demonstrate that MQuinE can mitigate the negative impact of Z-paradox and outperform existing KGE models by a visible margin on link prediction tasks.
- [901] arXiv:2402.03588 (cross-list from cs.LG) [ pdf , ps , html , other ]
-
Title: Continual Domain Adversarial Adaptation via Double-Head DiscriminatorsComments: AISTATS 2024Subjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Abstract: Domain adversarial adaptation in a continual setting poses a significant challenge due to the limitations on accessing previous source domain data. Despite extensive research in continual learning, the task of adversarial adaptation cannot be effectively accomplished using only a small number of stored source domain data, which is a standard setting in memory replay approaches. This limitation arises from the erroneous empirical estimation of $\gH$-divergence with few source domain samples. To tackle this problem, we propose a double-head discriminator algorithm, by introducing an addition source-only domain discriminator that are trained solely on source learning phase. We prove that with the introduction of a pre-trained source-only domain discriminator, the empirical estimation error of $\gH$-divergence related adversarial loss is reduced from the source domain side. Further experiments on existing domain adaptation benchmark show that our proposed algorithm achieves more than 2$\%$ improvement on all categories of target domain adaptation task while significantly mitigating the forgetting on source domain.
- [902] arXiv:2402.03590 (cross-list from cs.LG) [ pdf , ps , html , other ]
-
Title: Assessing the Impact of Distribution Shift on Reinforcement Learning PerformanceComments: Poster at the Workshop on Regulatable Machine Learning at the 37th Conference on Neural Information Processing Systems (RegML @ NeurIPS 2023)Subjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Multiagent Systems (cs.MA)
Abstract: Research in machine learning is making progress in fixing its own reproducibility crisis. Reinforcement learning (RL), in particular, faces its own set of unique challenges. Comparison of point estimates, and plots that show successful convergence to the optimal policy during training, may obfuscate overfitting or dependence on the experimental setup. Although researchers in RL have proposed reliability metrics that account for uncertainty to better understand each algorithm's strengths and weaknesses, the recommendations of past work do not assume the presence of out-of-distribution observations. We propose a set of evaluation methods that measure the robustness of RL algorithms under distribution shifts. The tools presented here argue for the need to account for performance over time while the agent is acting in its environment. In particular, we recommend time series analysis as a method of observational RL evaluation. We also show that the unique properties of RL and simulated dynamic environments allow us to make stronger assumptions to justify the measurement of causal impact in our evaluations. We then apply these tools to single-agent and multi-agent environments to show the impact of introducing distribution shifts during test time. We present this methodology as a first step toward rigorous RL evaluation in the presence of distribution shifts.
- [903] arXiv:2402.03610 (cross-list from cs.LG) [ pdf , ps , html , other ]
-
Title: RAP: Retrieval-Augmented Planning with Contextual Memory for Multimodal LLM AgentsTomoyuki Kagaya , Thong Jing Yuan , Yuxuan Lou , Jayashree Karlekar , Sugiri Pranata , Akira Kinose , Koki Oguri , Felix Wick , Yang YouSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
Abstract: Owing to recent advancements, Large Language Models (LLMs) can now be deployed as agents for increasingly complex decision-making applications in areas including robotics, gaming, and API integration. However, reflecting past experiences in current decision-making processes, an innate human behavior, continues to pose significant challenges. Addressing this, we propose Retrieval-Augmented Planning (RAP) framework, designed to dynamically leverage past experiences corresponding to the current situation and context, thereby enhancing agents' planning capabilities. RAP distinguishes itself by being versatile: it excels in both text-only and multimodal environments, making it suitable for a wide range of tasks. Empirical evaluations demonstrate RAP's effectiveness, where it achieves SOTA performance in textual scenarios and notably enhances multimodal LLM agents' performance for embodied tasks. These results highlight RAP's potential in advancing the functionality and applicability of LLM agents in complex, real-world applications.
- [904] arXiv:2402.03616 (cross-list from cs.CL) [ pdf , ps , html , other ]
-
Title: Leveraging Large Language Models for Hybrid Workplace Decision SupportSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI); Human-Computer Interaction (cs.HC); Information Retrieval (cs.IR)
Abstract: Large Language Models (LLMs) hold the potential to perform a variety of text processing tasks and provide textual explanations for proposed actions or decisions. In the era of hybrid work, LLMs can provide intelligent decision support for workers who are designing their hybrid work plans. In particular, they can offer suggestions and explanations to workers balancing numerous decision factors, thereby enhancing their work experience. In this paper, we present a decision support model for workspaces in hybrid work environments, leveraging the reasoning skill of LLMs. We first examine LLM's capability of making suitable workspace suggestions. We find that its reasoning extends beyond the guidelines in the prompt and the LLM can manage the trade-off among the available resources in the workspaces. We conduct an extensive user study to understand workers' decision process for workspace choices and evaluate the effectiveness of the system. We observe that a worker's decision could be influenced by the LLM's suggestions and explanations. The participants in our study find the system to be convenient, regardless of whether reasons are provided or not. Our results show that employees can benefit from the LLM-empowered system for their workspace selection in hybrid workplace.
- [905] arXiv:2402.03621 (cross-list from cs.LG) [ pdf , ps , html , other ]
-
Title: Neural Network Approximators for Marginal MAP in Probabilistic CircuitsComments: Will appear in AAAI 2024Subjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Abstract: Probabilistic circuits (PCs) such as sum-product networks efficiently represent large multi-variate probability distributions. They are preferred in practice over other probabilistic representations such as Bayesian and Markov networks because PCs can solve marginal inference (MAR) tasks in time that scales linearly in the size of the network. Unfortunately, the maximum-a-posteriori (MAP) and marginal MAP (MMAP) tasks remain NP-hard in these models. Inspired by the recent work on using neural networks for generating near-optimal solutions to optimization problems such as integer linear programming, we propose an approach that uses neural networks to approximate (M)MAP inference in PCs. The key idea in our approach is to approximate the cost of an assignment to the query variables using a continuous multilinear function, and then use the latter as a loss function. The two main benefits of our new method are that it is self-supervised and after the neural network is learned, it requires only linear time to output a solution. We evaluate our new approach on several benchmark datasets and show that it outperforms three competing linear time approximations, max-product inference, max-marginal inference and sequential estimation, which are used in practice to solve MMAP tasks in PCs.
- [906] arXiv:2402.03627 (cross-list from cs.CL) [ pdf , ps , other ]
-
Title: Partially Recentralization Softmax Loss for Vision-Language Models RobustnessSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI)
Abstract: As Large Language Models make a breakthrough in natural language processing tasks (NLP), multimodal technique becomes extremely popular. However, it has been shown that multimodal NLP are vulnerable to adversarial attacks, where the outputs of a model can be dramatically changed by a perturbation to the input. While several defense techniques have been proposed both in computer vision and NLP models, the multimodal robustness of models have not been fully explored. In this paper, we study the adversarial robustness provided by modifying loss function of pre-trained multimodal models, by restricting top K softmax outputs. Based on the evaluation and scoring, our experiments show that after a fine-tuning, adversarial robustness of pre-trained models can be significantly improved, against popular attacks. Further research should be studying, such as output diversity, generalization and the robustness-performance trade-off of this kind of loss functions. Our code will be available after this paper is accepted
- [907] arXiv:2402.03630 (cross-list from cs.SE) [ pdf , ps , html , other ]
-
Title: Enhancing LLM-Based Coding Tools through Native Integration of IDE-Derived Static ContextSubjects: Software Engineering (cs.SE) ; Artificial Intelligence (cs.AI)
Abstract: Large Language Models (LLMs) have achieved remarkable success in code completion, as evidenced by their essential roles in developing code assistant services such as Copilot. Being trained on in-file contexts, current LLMs are quite effective in completing code for single source files. However, it is challenging for them to conduct repository-level code completion for large software projects that require cross-file information. Existing research on LLM-based repository-level code completion identifies and integrates cross-file contexts, but it suffers from low accuracy and limited context length of LLMs. In this paper, we argue that Integrated Development Environments (IDEs) can provide direct, accurate and real-time cross-file information for repository-level code completion. We propose IDECoder, a practical framework that leverages IDE native static contexts for cross-context construction and diagnosis results for self-refinement. IDECoder utilizes the rich cross-context information available in IDEs to enhance the capabilities of LLMs of repository-level code completion. We conducted preliminary experiments to validate the performance of IDECoder and observed that this synergy represents a promising trend for future exploration.
- [908] arXiv:2402.03647 (cross-list from cs.LG) [ pdf , ps , html , other ]
-
Title: CAMBranch: Contrastive Learning with Augmented MILPs for BranchingSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Abstract: Recent advancements have introduced machine learning frameworks to enhance the Branch and Bound (B\&B) branching policies for solving Mixed Integer Linear Programming (MILP). These methods, primarily relying on imitation learning of Strong Branching, have shown superior performance. However, collecting expert samples for imitation learning, particularly for Strong Branching, is a time-consuming endeavor. To address this challenge, we propose \textbf{C}ontrastive Learning with \textbf{A}ugmented \textbf{M}ILPs for \textbf{Branch}ing (CAMBranch), a framework that generates Augmented MILPs (AMILPs) by applying variable shifting to limited expert data from their original MILPs. This approach enables the acquisition of a considerable number of labeled expert samples. CAMBranch leverages both MILPs and AMILPs for imitation learning and employs contrastive learning to enhance the model's ability to capture MILP features, thereby improving the quality of branching decisions. Experimental results demonstrate that CAMBranch, trained with only 10\% of the complete dataset, exhibits superior performance. Ablation studies further validate the effectiveness of our method.
- [909] arXiv:2402.03660 (cross-list from cs.LG) [ pdf , ps , html , other ]
-
Title: Cross-Task Linearity Emerges in the Pretraining-Finetuning ParadigmComments: 27 pages, 24 figuresSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Abstract: The pretraining-finetuning paradigm has become the prevailing trend in modern deep learning. In this work, we discover an intriguing linear phenomenon in models that are initialized from a common pretrained checkpoint and finetuned on different tasks, termed as Cross-Task Linearity (CTL). Specifically, if we linearly interpolate the weights of two finetuned models, the features in the weight-interpolated model are approximately equal to the linear interpolation of features in two finetuned models at each layer. Such cross-task linearity has not been noted in peer literature. We provide comprehensive empirical evidence supporting that CTL consistently occurs for finetuned models that start from the same pretrained checkpoint. We conjecture that in the pretraining-finetuning paradigm, neural networks essentially function as linear maps, mapping from the parameter space to the feature space. Based on this viewpoint, our study unveils novel insights into explaining model merging/editing, particularly by translating operations from the parameter space to the feature space. Furthermore, we delve deeper into the underlying factors for the emergence of CTL, emphasizing the impact of pretraining.
- [910] arXiv:2402.03661 (cross-list from cs.LG) [ pdf , ps , html , other ]
-
Title: Transductive Reward Inference on GraphSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Abstract: In this study, we present a transductive inference approach on that reward information propagation graph, which enables the effective estimation of rewards for unlabelled data in offline reinforcement learning. Reward inference is the key to learning effective policies in practical scenarios, while direct environmental interactions are either too costly or unethical and the reward functions are rarely accessible, such as in healthcare and robotics. Our research focuses on developing a reward inference method based on the contextual properties of information propagation on graphs that capitalizes on a constrained number of human reward annotations to infer rewards for unlabelled data. We leverage both the available data and limited reward annotations to construct a reward propagation graph, wherein the edge weights incorporate various influential factors pertaining to the rewards. Subsequently, we employ the constructed graph for transductive reward inference, thereby estimating rewards for unlabelled data. Furthermore, we establish the existence of a fixed point during several iterations of the transductive inference process and demonstrate its at least convergence to a local optimum. Empirical evaluations on locomotion and robotic manipulation tasks validate the effectiveness of our approach. The application of our inferred rewards improves the performance in offline reinforcement learning tasks.
- [911] arXiv:2402.03663 (cross-list from cs.LG) [ pdf , ps , html , other ]
-
Title: Symbol Correctness in Deep Neural Networks Containing Symbolic LayersSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Abstract: To handle AI tasks that combine perception and logical reasoning, recent work introduces Neurosymbolic Deep Neural Networks (NS-DNNs), which contain -- in addition to traditional neural layers -- symbolic layers: symbolic expressions (e.g., SAT formulas, logic programs) that are evaluated by symbolic solvers during inference. We identify and formalize an intuitive, high-level principle that can guide the design and analysis of NS-DNNs: symbol correctness, the correctness of the intermediate symbols predicted by the neural layers with respect to a (generally unknown) ground-truth symbolic representation of the input data. We demonstrate that symbol correctness is a necessary property for NS-DNN explainability and transfer learning (despite being in general impossible to train for). Moreover, we show that the framework of symbol correctness provides a precise way to reason and communicate about model behavior at neural-symbolic boundaries, and gives insight into the fundamental tradeoffs faced by NS-DNN training algorithms. In doing so, we both identify significant points of ambiguity in prior work, and provide a framework to support further NS-DNN developments.
- [912] arXiv:2402.03667 (cross-list from cs.CL) [ pdf , ps , html , other ]
-
Title: Large Language Models as an Indirect Reasoner: Contrapositive and Contradiction for Automated ReasoningComments: 20 pages,13 figures,4 tablesSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI)
Abstract: Recently, increasing attention has been focused drawn on to improve the ability of Large Language Models (LLMs) to perform complex reasoning. However, previous methods, such as Chain-of-Thought and Self-Consistency, mainly follow Direct Reasoning (DR) frameworks, so they will meet difficulty in solving numerous real-world tasks which can hardly be solved via DR. Therefore, to strengthen the reasoning power of LLMs, this paper proposes a novel Indirect Reasoning (IR) method that employs the logic of contrapositives and contradictions to tackle IR tasks such as factual reasoning and mathematic proof. Specifically, our methodology comprises two steps. Firstly, we leverage the logical equivalence of contrapositive to augment the data and rules to enhance the comprehensibility of LLMs. Secondly, we design a set of prompt templates to trigger LLMs to conduct IR based on proof by contradiction that is logically equivalent to the original DR process. Our IR method is simple yet effective and can be straightforwardly integrated with existing DR methods to further boost the reasoning abilities of LLMs. The experimental results on popular LLMs, such as GPT-3.5-turbo and Gemini-pro, show that our IR method enhances the overall accuracy of factual reasoning by 27.33% and mathematical proof by 31.43%, when compared with traditional DR methods. Moreover, the methods combining IR and DR significantly outperform the methods solely using IR or DR, further demonstrating the effectiveness of our strategy.
- [913] arXiv:2402.03675 (cross-list from q-bio.BM) [ pdf , ps , html , other ]
-
Title: Effective Protein-Protein Interaction Exploration with PPIretrievalSubjects: Biomolecules (q-bio.BM) ; Artificial Intelligence (cs.AI); Computational Engineering, Finance, and Science (cs.CE); Machine Learning (cs.LG)
Abstract: Protein-protein interactions (PPIs) are crucial in regulating numerous cellular functions, including signal transduction, transportation, and immune defense. As the accuracy of multi-chain protein complex structure prediction improves, the challenge has shifted towards effectively navigating the vast complex universe to identify potential PPIs. Herein, we propose PPIretrieval, the first deep learning-based model for protein-protein interaction exploration, which leverages existing PPI data to effectively search for potential PPIs in an embedding space, capturing rich geometric and chemical information of protein surfaces. When provided with an unseen query protein with its associated binding site, PPIretrieval effectively identifies a potential binding partner along with its corresponding binding site in an embedding space, facilitating the formation of protein-protein complexes.
- [914] arXiv:2402.03681 (cross-list from cs.RO) [ pdf , ps , other ]
-
Title: RL-VLM-F: Reinforcement Learning from Vision Language Foundation Model FeedbackSubjects: Robotics (cs.RO) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: Reward engineering has long been a challenge in Reinforcement Learning (RL) research, as it often requires extensive human effort and iterative processes of trial-and-error to design effective reward functions. In this paper, we propose RL-VLM-F, a method that automatically generates reward functions for agents to learn new tasks, using only a text description of the task goal and the agent's visual observations, by leveraging feedbacks from vision language foundation models (VLMs). The key to our approach is to query these models to give preferences over pairs of the agent's image observations based on the text description of the task goal, and then learn a reward function from the preference labels, rather than directly prompting these models to output a raw reward score, which can be noisy and inconsistent. We demonstrate that RL-VLM-F successfully produces effective rewards and policies across various domains - including classic control, as well as manipulation of rigid, articulated, and deformable objects - without the need for human supervision, outperforming prior methods that use large pretrained models for reward generation under the same assumptions. Videos can be found on our project website: this https URL
- [915] arXiv:2402.03686 (cross-list from cs.CL) [ pdf , ps , html , other ]
-
Title: Are Machines Better at Complex Reasoning? Unveiling Human-Machine Inference Gaps in Entailment VerificationSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI)
Abstract: Making inferences in text comprehension to understand the meaning is essential in language processing. This work studies the entailment verification (EV) problem of multi-sentence premises that requires a system to make multiple inferences implicitly. Studying EV for such complex premises is important because modern NLP problems, such as detecting inconsistent model-generated rationales, require complex multi-hop reasoning. However, current textual inference datasets mostly contain short premises that only partially focus on these challenges. To address this, we compile an EV benchmark that includes datasets from three NLP domains (NLI, contextual QA, and rationales) containing multi-sentence premises. On benchmarking humans and LLMs, we find that LLMs are better than humans in multi-hop reasoning across extended contexts, while humans perform better in simple deductive reasoning tasks. We also finetune a Flan-T5 model for EV using two training objectives to obtain a strong open-source model that outperforms GPT-3.5 and rivals GPT-4. Finally, we use this model to filter out inconsistent model-generated rationales in self-consistency decoding, resulting in a 6% accuracy improvement on average across three MCQ datasets.
- [916] arXiv:2402.03688 (cross-list from cs.CR) [ pdf , ps , html , other ]
-
Title: A Survey of Privacy Threats and Defense in Vertical Federated Learning: From Model Life Cycle PerspectiveLei Yu , Meng Han , Yiming Li , Changting Lin , Yao Zhang , Mingyang Zhang , Yan Liu , Haiqin Weng , Yuseok Jeon , Ka-Ho Chow , Stacy PattersonSubjects: Cryptography and Security (cs.CR) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: Vertical Federated Learning (VFL) is a federated learning paradigm where multiple participants, who share the same set of samples but hold different features, jointly train machine learning models. Although VFL enables collaborative machine learning without sharing raw data, it is still susceptible to various privacy threats. In this paper, we conduct the first comprehensive survey of the state-of-the-art in privacy attacks and defenses in VFL. We provide taxonomies for both attacks and defenses, based on their characterizations, and discuss open challenges and future research directions. Specifically, our discussion is structured around the model's life cycle, by delving into the privacy threats encountered during different stages of machine learning and their corresponding countermeasures. This survey not only serves as a resource for the research community but also offers clear guidance and actionable insights for practitioners to safeguard data privacy throughout the model's life cycle.
- [917] arXiv:2402.03694 (cross-list from cs.NI) [ pdf , ps , other ]
-
Title: ServeFlow: A Fast-Slow Model Architecture for Network Traffic AnalysisShinan Liu , Ted Shaowang , Gerry Wan , Jeewon Chae , Jonatas Marques , Sanjay Krishnan , Nick FeamsterSubjects: Networking and Internet Architecture (cs.NI) ; Artificial Intelligence (cs.AI)
Abstract: Network traffic analysis increasingly uses complex machine learning models as the internet consolidates and traffic gets more encrypted. However, over high-bandwidth networks, flows can easily arrive faster than model inference rates. The temporal nature of network flows limits simple scale-out approaches leveraged in other high-traffic machine learning applications. Accordingly, this paper presents ServeFlow, a solution for machine-learning model serving aimed at network traffic analysis tasks, which carefully selects the number of packets to collect and the models to apply for individual flows to achieve a balance between minimal latency, high service rate, and high accuracy. We identify that on the same task, inference time across models can differ by 2.7x-136.3x, while the median inter-packet waiting time is often 6-8 orders of magnitude higher than the inference time! ServeFlow is able to make inferences on 76.3% flows in under 16ms, which is a speed-up of 40.5x on the median end-to-end serving latency while increasing the service rate and maintaining similar accuracy. Even with thousands of features per flow, it achieves a service rate of over 48.5k new flows per second on a 16-core CPU commodity server, which matches the order of magnitude of flow rates observed on city-level network backbones.
- [918] arXiv:2402.03700 (cross-list from cs.HC) [ pdf , ps , other ]
-
Title: GenLens: A Systematic Evaluation of Visual GenAI Model OutputsComments: To Appear in IEEE PacificVis 2024Subjects: Human-Computer Interaction (cs.HC) ; Artificial Intelligence (cs.AI)
Abstract: The rapid development of generative AI (GenAI) models in computer vision necessitates effective evaluation methods to ensure their quality and fairness. Existing tools primarily focus on dataset quality assurance and model explainability, leaving a significant gap in GenAI output evaluation during model development. Current practices often depend on developers' subjective visual assessments, which may lack scalability and generalizability. This paper bridges this gap by conducting a formative study with GenAI model developers in an industrial setting. Our findings led to the development of GenLens, a visual analytic interface designed for the systematic evaluation of GenAI model outputs during the early stages of model development. GenLens offers a quantifiable approach for overviewing and annotating failure cases, customizing issue tags and classifications, and aggregating annotations from multiple users to enhance collaboration. A user study with model developers reveals that GenLens effectively enhances their workflow, evidenced by high satisfaction rates and a strong intent to integrate it into their practices. This research underscores the importance of robust early-stage evaluation tools in GenAI development, contributing to the advancement of fair and high-quality GenAI models.
- [919] arXiv:2402.03706 (cross-list from cs.RO) [ pdf , ps , other ]
-
Title: MMAUD: A Comprehensive Multi-Modal Anti-UAV Dataset for Modern Miniature Drone ThreatsShenghai Yuan , Yizhuo Yang , Thien Hoang Nguyen , Thien-Minh Nguyen , Jianfei Yang , Fen Liu , Jianping Li , Han Wang , Lihua XieComments: Accepted by ICRA 2024Subjects: Robotics (cs.RO) ; Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
Abstract: In response to the evolving challenges posed by small unmanned aerial vehicles (UAVs), which possess the potential to transport harmful payloads or independently cause damage, we introduce MMAUD: a comprehensive Multi-Modal Anti-UAV Dataset. MMAUD addresses a critical gap in contemporary threat detection methodologies by focusing on drone detection, UAV-type classification, and trajectory estimation. MMAUD stands out by combining diverse sensory inputs, including stereo vision, various Lidars, Radars, and audio arrays. It offers a unique overhead aerial detection vital for addressing real-world scenarios with higher fidelity than datasets captured on specific vantage points using thermal and RGB. Additionally, MMAUD provides accurate Leica-generated ground truth data, enhancing credibility and enabling confident refinement of algorithms and models, which has never been seen in other datasets. Most existing works do not disclose their datasets, making MMAUD an invaluable resource for developing accurate and efficient solutions. Our proposed modalities are cost-effective and highly adaptable, allowing users to experiment and implement new UAV threat detection tools. Our dataset closely simulates real-world scenarios by incorporating ambient heavy machinery sounds. This approach enhances the dataset's applicability, capturing the exact challenges faced during proximate vehicular operations. It is expected that MMAUD can play a pivotal role in advancing UAV threat detection, classification, trajectory estimation capabilities, and beyond. Our dataset, codes, and designs will be available in this https URL .
- [920] arXiv:2402.03715 (cross-list from cs.LG) [ pdf , ps , other ]
-
Title: Clarify: Improving Model Robustness With Natural Language CorrectionsSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
Abstract: In supervised learning, models are trained to extract correlations from a static dataset. This often leads to models that rely on high-level misconceptions. To prevent such misconceptions, we must necessarily provide additional information beyond the training data. Existing methods incorporate forms of additional instance-level supervision, such as labels for spurious features or additional labeled data from a balanced distribution. Such strategies can become prohibitively costly for large-scale datasets since they require additional annotation at a scale close to the original training data. We hypothesize that targeted natural language feedback about a model's misconceptions is a more efficient form of additional supervision. We introduce Clarify, a novel interface and method for interactively correcting model misconceptions. Through Clarify, users need only provide a short text description to describe a model's consistent failure patterns. Then, in an entirely automated way, we use such descriptions to improve the training process by reweighting the training data or gathering additional targeted data. Our user studies show that non-expert users can successfully describe model misconceptions via Clarify, improving worst-group accuracy by an average of 17.1% in two datasets. Additionally, we use Clarify to find and rectify 31 novel hard subpopulations in the ImageNet dataset, improving minority-split accuracy from 21.1% to 28.7%.
- [921] arXiv:2402.03719 (cross-list from cs.CL) [ pdf , ps , other ]
-
Title: Empowering Language Models with Active Inquiry for Deeper UnderstandingJing-Cheng Pang , Heng-Bo Fan , Pengyuan Wang , Jia-Hao Xiao , Nan Tang , Si-Hang Yang , Chengxing Jia , Sheng-Jun Huang , Yang YuSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI)
Abstract: The rise of large language models (LLMs) has revolutionized the way that we interact with artificial intelligence systems through natural language. However, LLMs often misinterpret user queries because of their uncertain intention, leading to less helpful responses. In natural human interactions, clarification is sought through targeted questioning to uncover obscure information. Thus, in this paper, we introduce LaMAI (Language Model with Active Inquiry), designed to endow LLMs with this same level of interactive engagement. LaMAI leverages active learning techniques to raise the most informative questions, fostering a dynamic bidirectional dialogue. This approach not only narrows the contextual gap but also refines the output of the LLMs, aligning it more closely with user expectations. Our empirical studies, across a variety of complex datasets where LLMs have limited conversational context, demonstrate the effectiveness of LaMAI. The method improves answer accuracy from 31.9% to 50.9%, outperforming other leading question-answering frameworks. Moreover, in scenarios involving human participants, LaMAI consistently generates responses that are superior or comparable to baseline methods in more than 82% of the cases. The applicability of LaMAI is further evidenced by its successful integration with various LLMs, highlighting its potential for the future of interactive language models.
- [922] arXiv:2402.03720 (cross-list from cs.LG) [ pdf , ps , other ]
-
Title: Similarity-based Neighbor Selection for Graph LLMsSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Social and Information Networks (cs.SI)
Abstract: Text-attributed graphs (TAGs) present unique challenges for direct processing by Language Learning Models (LLMs), yet their extensive commonsense knowledge and robust reasoning capabilities offer great promise for node classification in TAGs. Prior research in this field has grappled with issues such as over-squashing, heterophily, and ineffective graph information integration, further compounded by inconsistencies in dataset partitioning and underutilization of advanced LLMs. To address these challenges, we introduce Similarity-based Neighbor Selection (SNS). Using SimCSE and advanced neighbor selection techniques, SNS effectively improves the quality of selected neighbors, thereby improving graph representation and alleviating issues like over-squashing and heterophily. Besides, as an inductive and training-free approach, SNS demonstrates superior generalization and scalability over traditional GNN methods. Our comprehensive experiments, adhering to standard dataset partitioning practices, demonstrate that SNS, through simple prompt interactions with LLMs, consistently outperforms vanilla GNNs and achieves state-of-the-art results on datasets like PubMed in node classification, showcasing LLMs' potential in graph structure understanding. Our research further underscores the significance of graph structure integration in LLM applications and identifies key factors for their success in node classification. Code is available at this https URL .
- [923] arXiv:2402.03741 (cross-list from cs.LG) [ pdf , ps , other ]
-
Title: SUB-PLAY: Adversarial Policies against Partially Observed Multi-Agent Reinforcement Learning SystemsSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Cryptography and Security (cs.CR)
Abstract: Recent advances in multi-agent reinforcement learning (MARL) have opened up vast application prospects, including swarm control of drones, collaborative manipulation by robotic arms, and multi-target encirclement. However, potential security threats during the MARL deployment need more attention and thorough investigation. Recent researches reveal that an attacker can rapidly exploit the victim's vulnerabilities and generate adversarial policies, leading to the victim's failure in specific tasks. For example, reducing the winning rate of a superhuman-level Go AI to around 20%. They predominantly focus on two-player competitive environments, assuming attackers possess complete global state observation.
In this study, we unveil, for the first time, the capability of attackers to generate adversarial policies even when restricted to partial observations of the victims in multi-agent competitive environments. Specifically, we propose a novel black-box attack (SUB-PLAY), which incorporates the concept of constructing multiple subgames to mitigate the impact of partial observability and suggests the sharing of transitions among subpolicies to improve the exploitative ability of attackers. Extensive evaluations demonstrate the effectiveness of SUB-PLAY under three typical partial observability limitations. Visualization results indicate that adversarial policies induce significantly different activations of the victims' policy networks. Furthermore, we evaluate three potential defenses aimed at exploring ways to mitigate security threats posed by adversarial policies, providing constructive recommendations for deploying MARL in competitive environments. - [924] arXiv:2402.03750 (cross-list from cs.LG) [ pdf , ps , other ]
-
Title: Digital Twin Mobility Profiling: A Spatio-Temporal Graph Learning ApproachComments: 10 pages, 7 figuresJournal-ref: The 7th IEEE International Conference on Data Science and Systems (DSS), Dec 20 - 22, 2021, Haikou, ChinaSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Human-Computer Interaction (cs.HC)
Abstract: With the arrival of the big data era, mobility profiling has become a viable method of utilizing enormous amounts of mobility data to create an intelligent transportation system. Mobility profiling can extract potential patterns in urban traffic from mobility data and is critical for a variety of traffic-related applications. However, due to the high level of complexity and the huge amount of data, mobility profiling faces huge challenges. Digital Twin (DT) technology paves the way for cost-effective and performance-optimised management by digitally creating a virtual representation of the network to simulate its behaviour. In order to capture the complex spatio-temporal features in traffic scenario, we construct alignment diagrams to assist in completing the spatio-temporal correlation representation and design dilated alignment convolution network (DACN) to learn the fine-grained correlations, i.e., spatio-temporal interactions. We propose a digital twin mobility profiling (DTMP) framework to learn node profiles on a mobility network DT model. Extensive experiments have been conducted upon three real-world datasets. Experimental results demonstrate the effectiveness of DTMP.
- [925] arXiv:2402.03766 (cross-list from cs.CV) [ pdf , ps , other ]
-
Title: MobileVLM V2: Faster and Stronger Baseline for Vision Language ModelXiangxiang Chu , Limeng Qiao , Xinyu Zhang , Shuang Xu , Fei Wei , Yang Yang , Xiaofei Sun , Yiming Hu , Xinyang Lin , Bo Zhang , Chunhua ShenSubjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI)
Abstract: We introduce MobileVLM V2, a family of significantly improved vision language models upon MobileVLM, which proves that a delicate orchestration of novel architectural design, an improved training scheme tailored for mobile VLMs, and rich high-quality dataset curation can substantially benefit VLMs' performance. Specifically, MobileVLM V2 1.7B achieves better or on-par performance on standard VLM benchmarks compared with much larger VLMs at the 3B scale. Notably, our 3B model outperforms a large variety of VLMs at the 7B+ scale. Our models will be released at this https URL .
- [926] arXiv:2402.03774 (cross-list from cs.LG) [ pdf , ps , other ]
-
Title: Learning a Decision Tree Algorithm with TransformersSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
Abstract: Decision trees are renowned for their interpretability capability to achieve high predictive performance, especially on tabular data. Traditionally, they are constructed through recursive algorithms, where they partition the data at every node in a tree. However, identifying the best partition is challenging, as decision trees optimized for local segments may not bring global generalization. To address this, we introduce MetaTree, which trains a transformer-based model on filtered outputs from classical algorithms to produce strong decision trees for classification. Specifically, we fit both greedy decision trees and optimized decision trees on a large number of datasets. We then train MetaTree to produce the trees that achieve strong generalization performance. This training enables MetaTree to not only emulate these algorithms, but also to intelligently adapt its strategy according to the context, thereby achieving superior generalization performance.
- [927] arXiv:2402.03776 (cross-list from cs.CL) [ pdf , ps , html , other ]
-
Title: Large Language Models As MOOCs GradersComments: v1.3 preprintSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI)
Abstract: Massive open online courses (MOOCs) unlock the doors to free education for anyone around the globe with access to a computer and the internet. Despite this democratization of learning, the massive enrollment in these courses means it is almost impossible for one instructor to assess every student's writing assignment. As a result, peer grading, often guided by a straightforward rubric, is the method of choice. While convenient, peer grading often falls short in terms of reliability and validity. In this study, using 18 distinct settings, we explore the feasibility of leveraging large language models (LLMs) to replace peer grading in MOOCs. Specifically, we focus on two state-of-the-art LLMs: GPT-4 and GPT-3.5, across three distinct courses: Introductory Astronomy, Astrobiology, and the History and Philosophy of Astronomy. To instruct LLMs, we use three different prompts based on a variant of the zero-shot chain-of-thought (Zero-shot-CoT) prompting technique: Zero-shot-CoT combined with instructor-provided correct answers; Zero-shot-CoT in conjunction with both instructor-formulated answers and rubrics; and Zero-shot-CoT with instructor-offered correct answers and LLM-generated rubrics. Our results show that Zero-shot-CoT, when integrated with instructor-provided answers and rubrics, produces grades that are more aligned with those assigned by instructors compared to peer grading. However, the History and Philosophy of Astronomy course proves to be more challenging in terms of grading as opposed to other courses. Finally, our study reveals a promising direction for automating grading systems for MOOCs, especially in subjects with well-defined rubrics.
- [928] arXiv:2402.03780 (cross-list from cs.CL) [ pdf , ps , html , other ]
-
Title: Exposing propaganda: an analysis of stylistic cues comparing human annotations and machine classificationGéraud Faye , Benjamin Icard , Morgane Casanova , Julien Chanson , François Maine , François Bancilhon , Guillaume Gadek , Guillaume Gravier , Paul ÉgréComments: Paper to appear in the EACL 2024 Proceedings of the Third Workshop on Understanding Implicit and Underspecified Language (UnImplicit 2024)Subjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: This paper investigates the language of propaganda and its stylistic features. It presents the PPN dataset, standing for Propagandist Pseudo-News, a multisource, multilingual, multimodal dataset composed of news articles extracted from websites identified as propaganda sources by expert agencies. A limited sample from this set was randomly mixed with papers from the regular French press, and their URL masked, to conduct an annotation-experiment by humans, using 11 distinct labels. The results show that human annotators were able to reliably discriminate between the two types of press across each of the labels. We propose different NLP techniques to identify the cues used by the annotators, and to compare them with machine classification. They include the analyzer VAGO to measure discourse vagueness and subjectivity, a TF-IDF to serve as a baseline, and four different classifiers: two RoBERTa-based models, CATS using syntax, and one XGBoost combining syntactic and semantic features.
- [929] arXiv:2402.03781 (cross-list from q-bio.QM) [ pdf , ps , html , other ]
-
Title: MolTC: Towards Molecular Relational Modeling In Language ModelsJunfeng Fang , Shuai Zhang , Chang Wu , Zhengyi Yang , Zhiyuan Liu , Sihang Li , Kun Wang , Wenjie Du , Xiang WangSubjects: Quantitative Methods (q-bio.QM) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: Molecular Relational Learning (MRL), aiming to understand interactions between molecular pairs, plays a pivotal role in advancing biochemical research. Recently, the adoption of large language models (LLMs), known for their vast knowledge repositories and advanced logical inference capabilities, has emerged as a promising way for efficient and effective MRL. Despite their potential, these methods predominantly rely on the textual data, thus not fully harnessing the wealth of structural information inherent in molecular graphs. Moreover, the absence of a unified framework exacerbates the issue of information underutilization, as it hinders the sharing of interaction mechanism learned across diverse datasets. To address these challenges, this work proposes a novel LLM-based multi-modal framework for Molecular inTeraction prediction following Chain-of-Thought (CoT) theory, termed MolTC, which effectively integrate graphical information of two molecules in pair. For achieving a unified MRL, MolTC innovatively develops a dynamic parameter-sharing strategy for cross-dataset information sharing. Moreover, to train MolTC efficiently, we introduce a Multi-hierarchical CoT concept to refine its training paradigm, and conduct a comprehensive Molecular Interactive Instructions dataset for the development of biochemical LLMs involving MRL. Our experiments, conducted across various datasets involving over 4,000,000 molecular pairs, exhibit the superiority of our method over current GNN and LLM-based baselines. Code is available at this https URL .
- [930] arXiv:2402.03782 (cross-list from cs.CL) [ pdf , ps , other ]
-
Title: Soft Prompt Tuning for Cross-Lingual Transfer: When Less is MoreFred Philippy , Siwen Guo , Shohreh Haddadan , Cedric Lothritz , Jacques Klein , Tegawendé F. BissyandéComments: Accepted at the 1st Workshop on Modular and Open Multilingual NLP (co-located with EACL 2024)Subjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI)
Abstract: Soft Prompt Tuning (SPT) is a parameter-efficient method for adapting pre-trained language models (PLMs) to specific tasks by inserting learnable embeddings, or soft prompts, at the input layer of the PLM, without modifying its parameters. This paper investigates the potential of SPT for cross-lingual transfer. Unlike previous studies on SPT for cross-lingual transfer that often fine-tune both the soft prompt and the model parameters, we adhere to the original intent of SPT by keeping the model parameters frozen and only training the soft prompt. This does not only reduce the computational cost and storage overhead of full-model fine-tuning, but we also demonstrate that this very parameter efficiency intrinsic to SPT can enhance cross-lingual transfer performance to linguistically distant languages. Moreover, we explore how different factors related to the prompt, such as the length or its reparameterization, affect cross-lingual transfer performance.
- [931] arXiv:2402.03784 (cross-list from cs.LG) [ pdf , ps , html , other ]
-
Title: AirPhyNet: Harnessing Physics-Guided Neural Networks for Air Quality PredictionComments: Accepted by the 12th International Conference on Learning Representations (ICLR 2024)Subjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Applied Physics (physics.app-ph)
Abstract: Air quality prediction and modelling plays a pivotal role in public health and environment management, for individuals and authorities to make informed decisions. Although traditional data-driven models have shown promise in this domain, their long-term prediction accuracy can be limited, especially in scenarios with sparse or incomplete data and they often rely on black-box deep learning structures that lack solid physical foundation leading to reduced transparency and interpretability in predictions. To address these limitations, this paper presents a novel approach named Physics guided Neural Network for Air Quality Prediction (AirPhyNet). Specifically, we leverage two well-established physics principles of air particle movement (diffusion and advection) by representing them as differential equation networks. Then, we utilize a graph structure to integrate physics knowledge into a neural network architecture and exploit latent representations to capture spatio-temporal relationships within the air quality data. Experiments on two real-world benchmark datasets demonstrate that AirPhyNet outperforms state-of-the-art models for different testing scenarios including different lead time (24h, 48h, 72h), sparse data and sudden change prediction, achieving reduction in prediction errors up to 10%. Moreover, a case study further validates that our model captures underlying physical processes of particle movement and generates accurate predictions with real physical meaning.
- [932] arXiv:2402.03792 (cross-list from cs.LG) [ pdf , ps , other ]
-
Title: No-Regret Reinforcement Learning in Smooth MDPsSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Abstract: Obtaining no-regret guarantees for reinforcement learning (RL) in the case of problems with continuous state and/or action spaces is still one of the major open challenges in the field. Recently, a variety of solutions have been proposed, but besides very specific settings, the general problem remains unsolved. In this paper, we introduce a novel structural assumption on the Markov decision processes (MDPs), namely $\nu-$smoothness, that generalizes most of the settings proposed so far (e.g., linear MDPs and Lipschitz MDPs). To face this challenging scenario, we propose two algorithms for regret minimization in $\nu-$smooth MDPs. Both algorithms build upon the idea of constructing an MDP representation through an orthogonal feature map based on Legendre polynomials. The first algorithm, \textsc{Legendre-Eleanor}, archives the no-regret property under weaker assumptions but is computationally inefficient, whereas the second one, \textsc{Legendre-LSVI}, runs in polynomial time, although for a smaller class of problems. After analyzing their regret properties, we compare our results with state-of-the-art ones from RL theory, showing that our algorithms achieve the best guarantees.
- [933] arXiv:2402.03796 (cross-list from cs.CV) [ pdf , ps , other ]
-
Title: Face Detection: Present State and Research DirectionsSubjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: The majority of computer vision applications that handle images featuring humans use face detection as a core component. Face detection still has issues, despite much research on the topic. Face detection's accuracy and speed might yet be increased. This review paper shows the progress made in this area as well as the substantial issues that still need to be tackled. The paper provides research directions that can be taken up as research projects in the field of face detection.
- [934] arXiv:2402.03804 (cross-list from cs.LG) [ pdf , ps , other ]
-
Title: ReLU$^2$ Wins: Discovering Efficient Activation Functions for Sparse LLMsZhengyan Zhang , Yixin Song , Guanghui Yu , Xu Han , Yankai Lin , Chaojun Xiao , Chenyang Song , Zhiyuan Liu , Zeyu Mi , Maosong SunSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Abstract: Sparse computation offers a compelling solution for the inference of Large Language Models (LLMs) in low-resource scenarios by dynamically skipping the computation of inactive neurons. While traditional approaches focus on ReLU-based LLMs, leveraging zeros in activation values, we broaden the scope of sparse LLMs beyond zero activation values. We introduce a general method that defines neuron activation through neuron output magnitudes and a tailored magnitude threshold, demonstrating that non-ReLU LLMs also exhibit sparse activation. To find the most efficient activation function for sparse computation, we propose a systematic framework to examine the sparsity of LLMs from three aspects: the trade-off between sparsity and performance, the predictivity of sparsity, and the hardware affinity. We conduct thorough experiments on LLMs utilizing different activation functions, including ReLU, SwiGLU, ReGLU, and ReLU$^2$. The results indicate that models employing ReLU$^2$ excel across all three evaluation aspects, highlighting its potential as an efficient activation function for sparse LLMs. We will release the code to facilitate future research.
- [935] arXiv:2402.03807 (cross-list from cs.LG) [ pdf , ps , html , other ]
-
Title: SEABO: A Simple Search-Based Method for Offline Imitation LearningComments: To appear in ICLR2024Subjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Abstract: Offline reinforcement learning (RL) has attracted much attention due to its ability in learning from static offline datasets and eliminating the need of interacting with the environment. Nevertheless, the success of offline RL relies heavily on the offline transitions annotated with reward labels. In practice, we often need to hand-craft the reward function, which is sometimes difficult, labor-intensive, or inefficient. To tackle this challenge, we set our focus on the offline imitation learning (IL) setting, and aim at getting a reward function based on the expert data and unlabeled data. To that end, we propose a simple yet effective search-based offline IL method, tagged SEABO. SEABO allocates a larger reward to the transition that is close to its closest neighbor in the expert demonstration, and a smaller reward otherwise, all in an unsupervised learning manner. Experimental results on a variety of D4RL datasets indicate that SEABO can achieve competitive performance to offline RL algorithms with ground-truth rewards, given only a single expert trajectory, and can outperform prior reward learning and offline IL methods across many tasks. Moreover, we demonstrate that SEABO also works well if the expert demonstrations contain only observations. Our code is publicly available at this https URL .
- [936] arXiv:2402.03843 (cross-list from cs.CV) [ pdf , ps , html , other ]
-
Title: A new method for optical steel rope non-destructive damage detectionSubjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI)
Abstract: This paper presents a novel algorithm for non-destructive damage detection for steel ropes in high-altitude environments (aerial ropeway). The algorithm comprises two key components: First, a segmentation model named RGBD-UNet is designed to accurately extract steel ropes from complex backgrounds. This model is equipped with the capability to process and combine color and depth information through the proposed CMA module. Second, a detection model named VovNetV3.5 is developed to differentiate between normal and abnormal steel ropes. It integrates the VovNet architecture with a DBB module to enhance performance. Besides, a novel background augmentation method is proposed to enhance the generalization ability of the segmentation model. Datasets containing images of steel ropes in different scenarios are created for the training and testing of both the segmentation and detection models. Experiments demonstrate a significant improvement over baseline models. On the proposed dataset, the highest accuracy achieved by the detection model reached 0.975, and the maximum F-measure achieved by the segmentation model reached 0.948.
- [937] arXiv:2402.03848 (cross-list from cs.CL) [ pdf , ps , html , other ]
-
Title: ANLS* -- A Universal Document Processing Metric for Generative Large Language ModelsSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI)
Abstract: Traditionally, discriminative models have been the predominant choice for tasks like document classification and information extraction. These models make predictions that fall into a limited number of predefined classes, facilitating a binary true or false evaluation and enabling the direct calculation of metrics such as the F1 score. However, recent advancements in generative large language models (GLLMs) have prompted a shift in the field due to their enhanced zero-shot capabilities, which eliminate the need for a downstream dataset and computationally expensive fine-tuning. However, evaluating GLLMs presents a challenge as the binary true or false evaluation used for discriminative models is not applicable to the predictions made by GLLMs.
This paper introduces a new metric for generative models called ANLS* for evaluating a wide variety of tasks, including information extraction and classification tasks. The ANLS* metric extends existing ANLS metrics as a drop-in-replacement and is still compatible with previously reported ANLS scores. An evaluation of 7 different datasets, 6 different GLLMs and 3 different prompting methods using the ANLS* metric is also provided, demonstrating the importance of the proposed metric.
We also benchmark a novel approach to generate prompts for documents, called SFT, against other prompting techniques such as LATIN. In 27 out of 35 cases, SFT outperforms other techniques and improves the state-of-the-art, sometimes by as much as $18$ percentage points.
Sources are available at this https URL - [938] arXiv:2402.03855 (cross-list from cs.LG) [ pdf , ps , html , other ]
-
Title: Position Paper: Toward New Frameworks for Studying Model RepresentationsSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Abstract: Mechanistic interpretability (MI) aims to understand AI models by reverse-engineering the exact algorithms neural networks learn. Most works in MI so far have studied behaviors and capabilities that are trivial and token-aligned. However, most capabilities are not that trivial, which advocates for the study of hidden representations inside these networks as the unit of analysis. We do a literature review, formalize representations for features and behaviors, highlight their importance and evaluation, and perform some basic exploration in the mechanistic interpretability of representations. With discussion and exploratory results, we justify our position that studying representations is an important and under-studied field, and that currently established methods in MI are not sufficient to understand representations, thus pushing for the research community to work toward new frameworks for studying representations.
- [939] arXiv:2402.03877 (cross-list from cs.CL) [ pdf , ps , html , other ]
-
Title: Beyond Lines and Circles: Unveiling the Geometric Reasoning Gap in Large Language ModelsComments: Preprint. Work in progressSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI)
Abstract: Large Language Models (LLMs) demonstrate ever-increasing abilities in mathematical and algorithmic tasks, yet their geometric reasoning skills are underexplored. We investigate LLMs' abilities in constructive geometric problem-solving one of the most fundamental steps in the development of human mathematical reasoning. Our work reveals notable challenges that the state-of-the-art LLMs face in this domain despite many successes in similar areas. LLMs exhibit biases in target variable selection and struggle with 2D spatial relationships, often misrepresenting and hallucinating objects and their placements. To this end, we introduce a framework that formulates an LLMs-based multi-agents system that enhances their existing reasoning potential by conducting an internal dialogue. This work underscores LLMs' current limitations in geometric reasoning and improves geometric reasoning capabilities through self-correction, collaboration, and diverse role specializations.
- [940] arXiv:2402.03885 (cross-list from cs.LG) [ pdf , ps , other ]
-
Title: MOMENT: A Family of Open Time-series Foundation ModelsSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Abstract: We introduce MOMENT, a family of open-source foundation models for general-purpose time-series analysis. Pre-training large models on time-series data is challenging due to (1) the absence of a large and cohesive public time-series repository, and (2) diverse time-series characteristics which make multi-dataset training onerous. Additionally, (3) experimental benchmarks to evaluate these models, especially in scenarios with limited resources, time, and supervision, are still in their nascent stages. To address these challenges, we compile a large and diverse collection of public time-series, called the Time-series Pile, and systematically tackle time-series-specific challenges to unlock large-scale multi-dataset pre-training. Finally, we build on recent work to design a benchmark to evaluate time-series foundation models on diverse tasks and datasets in limited supervision settings. Experiments on this benchmark demonstrate the effectiveness of our pre-trained models with minimal data and task-specific fine-tuning. Finally, we present several interesting empirical observations about large pre-trained time-series models. Our code is available anonymously at anonymous.4open.science/r/BETT-773F/.
- [941] arXiv:2402.03898 (cross-list from cs.CL) [ pdf , ps , html , other ]
-
Title: DistiLLM: Towards Streamlined Distillation for Large Language ModelsComments: Code is available at this https URLSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: Knowledge distillation (KD) is widely used for compressing a teacher model to a smaller student model, reducing its inference cost and memory footprint while preserving model capabilities. However, current KD methods for auto-regressive sequence models (e.g., large language models) suffer from missing a standardized objective function. Moreover, the recent use of student-generated outputs to address training-inference mismatches has significantly escalated computational costs. To tackle these issues, we introduce DistiLLM, a more effective and efficient KD framework for auto-regressive language models. DistiLLM comprises two components: (1) a novel skew Kullback-Leibler divergence loss, where we unveil and leverage its theoretical properties, and (2) an adaptive off-policy approach designed to enhance the efficiency in utilizing student-generated outputs. Extensive experiments, including instruction-following tasks, demonstrate the effectiveness of DistiLLM in building high-performing student models while achieving up to 4.3$\times$ speedup compared to recent KD methods.
- [942] arXiv:2402.03907 (cross-list from cs.HC) [ pdf , ps , other ]
-
Title: Embedding Large Language Models into Extended Reality: Opportunities and Challenges for Inclusion, Engagement, and PrivacySubjects: Human-Computer Interaction (cs.HC) ; Artificial Intelligence (cs.AI)
Abstract: Recent developments in computer graphics, hardware, artificial intelligence (AI), and human-computer interaction likely lead to extended reality (XR) devices and setups being more pervasive. While these devices and setups provide users with interactive, engaging, and immersive experiences with different sensing modalities, such as eye and hand trackers, many non-player characters are utilized in a pre-scripted way or by conventional AI techniques. In this paper, we argue for using large language models (LLMs) in XR by embedding them in virtual avatars or as narratives to facilitate more inclusive experiences through prompt engineering according to user profiles and fine-tuning the LLMs for particular purposes. We argue that such inclusion will facilitate diversity for XR use. In addition, we believe that with the versatile conversational capabilities of LLMs, users will engage more with XR environments, which might help XR be more used in everyday life. Lastly, we speculate that combining the information provided to LLM-powered environments by the users and the biometric data obtained through the sensors might lead to novel privacy invasions. While studying such possible privacy invasions, user privacy concerns and preferences should also be investigated. In summary, despite some challenges, embedding LLMs into XR is a promising and novel research area with several opportunities.
- [943] arXiv:2402.03921 (cross-list from cs.LG) [ pdf , ps , other ]
-
Title: Large Language Models to Enhance Bayesian OptimizationComments: Accepted as Poster at ICLR2024Subjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Abstract: Bayesian optimization (BO) is a powerful approach for optimizing complex and expensive-to-evaluate black-box functions. Its importance is underscored in many applications, notably including hyperparameter tuning, but its efficacy depends on efficiently balancing exploration and exploitation. While there has been substantial progress in BO methods, striking this balance remains a delicate process. In this light, we present LLAMBO, a novel approach that integrates the capabilities of Large Language Models (LLM) within BO. At a high level, we frame the BO problem in natural language, enabling LLMs to iteratively propose and evaluate promising solutions conditioned on historical evaluations. More specifically, we explore how combining contextual understanding, few-shot learning proficiency, and domain knowledge of LLMs can improve model-based BO. Our findings illustrate that LLAMBO is effective at zero-shot warmstarting, and enhances surrogate modeling and candidate sampling, especially in the early stages of search when observations are sparse. Our approach is performed in context and does not require LLM finetuning. Additionally, it is modular by design, allowing individual components to be integrated into existing BO frameworks, or function cohesively as an end-to-end method. We empirically validate LLAMBO's efficacy on the problem of hyperparameter tuning, highlighting strong empirical performance across a range of diverse benchmarks, proprietary, and synthetic tasks.
- [944] arXiv:2402.03927 (cross-list from cs.CL) [ pdf , ps , html , other ]
-
Title: Leak, Cheat, Repeat: Data Contamination and Evaluation Malpractices in Closed-Source LLMsComments: Accepted at EACL 2024 - main conferenceSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI)
Abstract: Natural Language Processing (NLP) research is increasingly focusing on the use of Large Language Models (LLMs), with some of the most popular ones being either fully or partially closed-source. The lack of access to model details, especially regarding training data, has repeatedly raised concerns about data contamination among researchers. Several attempts have been made to address this issue, but they are limited to anecdotal evidence and trial and error. Additionally, they overlook the problem of \emph{indirect} data leaking, where models are iteratively improved by using data coming from users. In this work, we conduct the first systematic analysis of work using OpenAI's GPT-3.5 and GPT-4, the most prominently used LLMs today, in the context of data contamination. By analysing 255 papers and considering OpenAI's data usage policy, we extensively document the amount of data leaked to these models during the first year after the model's release. We report that these models have been globally exposed to $\sim$4.7M samples from 263 benchmarks. At the same time, we document a number of evaluation malpractices emerging in the reviewed papers, such as unfair or missing baseline comparisons and reproducibility issues. We release our results as a collaborative project on this https URL , where other researchers can contribute to our efforts.
- [945] arXiv:2402.03941 (cross-list from cs.LG) [ pdf , ps , other ]
-
Title: Discovery of the Hidden World with Large Language ModelsComments: Preliminary version of an ongoing project; Chenxi and Yongqiang contributed equally; 26 pages, 41 figures; Project page: this https URLSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Methodology (stat.ME)
Abstract: Science originates with discovering new causal knowledge from a combination of known facts and observations. Traditional causal discovery approaches mainly rely on high-quality measured variables, usually given by human experts, to find causal relations. However, the causal variables are usually unavailable in a wide range of real-world applications. The rise of large language models (LLMs) that are trained to learn rich knowledge from the massive observations of the world, provides a new opportunity to assist with discovering high-level hidden variables from the raw observational data. Therefore, we introduce COAT: Causal representatiOn AssistanT. COAT incorporates LLMs as a factor proposer that extracts the potential causal factors from unstructured data. Moreover, LLMs can also be instructed to provide additional information used to collect data values (e.g., annotation criteria) and to further parse the raw unstructured data into structured data. The annotated data will be fed to a causal learning module (e.g., the FCI algorithm) that provides both rigorous explanations of the data, as well as useful feedback to further improve the extraction of causal factors by LLMs. We verify the effectiveness of COAT in uncovering the underlying causal system with two case studies of review rating analysis and neuropathic diagnosis.
- [946] arXiv:2402.03948 (cross-list from cs.CY) [ pdf , ps , other ]
-
Title: Identifying Student Profiles Within Online Judge Systems Using Explainable Artificial IntelligenceJournal-ref: IEEE Transactions on Learning Technologies ( Volume: 16, Issue: 6, December 2023)Subjects: Computers and Society (cs.CY) ; Artificial Intelligence (cs.AI)
Abstract: Online Judge (OJ) systems are typically considered within programming-related courses as they yield fast and objective assessments of the code developed by the students. Such an evaluation generally provides a single decision based on a rubric, most commonly whether the submission successfully accomplished the assignment. Nevertheless, since in an educational context such information may be deemed insufficient, it would be beneficial for both the student and the instructor to receive additional feedback about the overall development of the task. This work aims to tackle this limitation by considering the further exploitation of the information gathered by the OJ and automatically inferring feedback for both the student and the instructor. More precisely, we consider the use of learning-based schemes -- particularly, multi-instance learning (MIL) and classical machine learning formulations -- to model student behavior. Besides, explainable artificial intelligence (XAI) is contemplated to provide human-understandable feedback. The proposal has been evaluated considering a case of study comprising 2500 submissions from roughly 90 different students from a programming-related course in a computer science degree. The results obtained validate the proposal: The model is capable of significantly predicting the user outcome (either passing or failing the assignment) solely based on the behavioral pattern inferred by the submissions provided to the OJ. Moreover, the proposal is able to identify prone-to-fail student groups and profiles as well as other relevant information, which eventually serves as feedback to both the student and the instructor.
- [947] arXiv:2402.03951 (cross-list from cs.CV) [ pdf , ps , other ]
-
Title: Boosting Adversarial Transferability across Model Genus by Deformation-Constrained WarpingQinliang Lin , Cheng Luo , Zenghao Niu , Xilin He , Weicheng Xie , Yuanbo Hou , Linlin Shen , Siyang SongComments: AAAI 2024Subjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI)
Abstract: Adversarial examples generated by a surrogate model typically exhibit limited transferability to unknown target systems. To address this problem, many transferability enhancement approaches (e.g., input transformation and model augmentation) have been proposed. However, they show poor performances in attacking systems having different model genera from the surrogate model. In this paper, we propose a novel and generic attacking strategy, called Deformation-Constrained Warping Attack (DeCoWA), that can be effectively applied to cross model genus attack. Specifically, DeCoWA firstly augments input examples via an elastic deformation, namely Deformation-Constrained Warping (DeCoW), to obtain rich local details of the augmented input. To avoid severe distortion of global semantics led by random deformation, DeCoW further constrains the strength and direction of the warping transformation by a novel adaptive control strategy. Extensive experiments demonstrate that the transferable examples crafted by our DeCoWA on CNN surrogates can significantly hinder the performance of Transformers (and vice versa) on various tasks, including image classification, video action recognition, and audio recognition. Code is made available at this https URL .
- [948] arXiv:2402.03970 (cross-list from cs.LG) [ pdf , ps , other ]
-
Title: Tabular Data: Is Attention All You Need?Subjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Abstract: Deep Learning has revolutionized the field of AI and led to remarkable achievements in applications involving image and text data. Unfortunately, there is inconclusive evidence on the merits of neural networks for structured tabular data. In this paper, we introduce a large-scale empirical study comparing neural networks against gradient-boosted decision trees on tabular data, but also transformer-based architectures against traditional multi-layer perceptrons (MLP) with residual connections. In contrast to prior work, our empirical findings indicate that neural networks are competitive against decision trees. Furthermore, we assess that transformer-based architectures do not outperform simpler variants of traditional MLP architectures on tabular datasets. As a result, this paper helps the research and practitioner communities make informed choices on deploying neural networks on future tabular data applications.
- [949] arXiv:2402.03972 (cross-list from cs.MA) [ pdf , ps , other ]
-
Title: Joint Intrinsic Motivation for Coordinated Exploration in Multi-Agent Deep Reinforcement LearningComments: 13 pages, 13 figures. Published as an extended abstract at AAMAS 2024Subjects: Multiagent Systems (cs.MA) ; Artificial Intelligence (cs.AI)
Abstract: Multi-agent deep reinforcement learning (MADRL) problems often encounter the challenge of sparse rewards. This challenge becomes even more pronounced when coordination among agents is necessary. As performance depends not only on one agent's behavior but rather on the joint behavior of multiple agents, finding an adequate solution becomes significantly harder. In this context, a group of agents can benefit from actively exploring different joint strategies in order to determine the most efficient one. In this paper, we propose an approach for rewarding strategies where agents collectively exhibit novel behaviors. We present JIM (Joint Intrinsic Motivation), a multi-agent intrinsic motivation method that follows the centralized learning with decentralized execution paradigm. JIM rewards joint trajectories based on a centralized measure of novelty designed to function in continuous environments. We demonstrate the strengths of this approach both in a synthetic environment designed to reveal shortcomings of state-of-the-art MADRL methods, and in simulated robotic tasks. Results show that joint exploration is crucial for solving tasks where the optimal strategy requires a high level of coordination.
- [950] arXiv:2402.04009 (cross-list from cs.CV) [ pdf , ps , other ]
-
Title: Low-rank Attention Side-Tuning for Parameter-Efficient Fine-TuningSubjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI)
Abstract: In finetuning a large pretrained model to downstream tasks, parameter-efficient fine-tuning (PEFT) methods can effectively finetune pretrained models with few trainable parameters, but suffer from high GPU memory consumption and slow training speed. Because learnable parameters from these methods are entangled with the pretrained model, gradients related to the frozen pretrained model's parameters have to be computed and stored during finetuning. We propose Low-rank Attention Side-Tuning (LAST), which disentangles the trainable module from the pretrained model by freezing not only parameters but also outputs of the pretrained network. LAST trains a side-network composed of only low-rank self-attention modules. By viewing the pretrained model as a frozen feature extractor, the side-network takes intermediate output from the pretrained model and focus on learning task-specific knowledge. We also show that LAST can be highly parallel across multiple optimization objectives, making it very efficient in downstream task adaptation, for example, in finding optimal hyperparameters. LAST outperforms previous state-of-the-art methods on VTAB-1K and other visual adaptation tasks with roughly only 30\% of GPU memory footprint and 60\% of training time compared to existing PEFT methods, but achieves significantly higher accuracy.
- [951] arXiv:2402.04028 (cross-list from cs.CL) [ pdf , ps , other ]
-
Title: AlbNews: A Corpus of Headlines for Topic Modeling in AlbanianSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: The scarcity of available text corpora for low-resource languages like Albanian is a serious hurdle for research in natural language processing tasks. This paper introduces AlbNews, a collection of 600 topically labeled news headlines and 2600 unlabeled ones in Albanian. The data can be freely used for conducting topic modeling research. We report the initial classification scores of some traditional machine learning classifiers trained with the AlbNews samples. These results show that basic models outrun the ensemble learning ones and can serve as a baseline for future experiments.
- [952] arXiv:2402.04032 (cross-list from cs.AR) [ pdf , ps , html , other ]
-
Title: HEAM : Hashed Embedding Acceleration using Processing-In-MemoryComments: 10 pages, 12 figuresSubjects: Hardware Architecture (cs.AR) ; Artificial Intelligence (cs.AI)
Abstract: In today's data centers, personalized recommendation systems face challenges such as the need for large memory capacity and high bandwidth, especially when performing embedding operations. Previous approaches have relied on DIMM-based near-memory processing techniques or introduced 3D-stacked DRAM to address memory-bound issues and expand memory bandwidth. However, these solutions fall short when dealing with the expanding size of personalized recommendation systems. Recommendation models have grown to sizes exceeding tens of terabytes, making them challenging to run efficiently on traditional single-node inference servers. Although various algorithmic methods have been proposed to reduce embedding table capacity, they often result in increased memory access or inefficient utilization of memory resources. This paper introduces HEAM, a heterogeneous memory architecture that integrates 3D-stacked DRAM with DIMM to accelerate recommendation systems in which compositional embedding is utilized-a technique aimed at reducing the size of embedding tables. The architecture is organized into a three-tier memory hierarchy consisting of conventional DIMM, 3D-stacked DRAM with a base die-level Processing-In-Memory (PIM), and a bank group-level PIM incorporating lookup tables. This setup is specifically designed to accommodate the unique aspects of compositional embedding, such as temporal locality and embedding table capacity. This design effectively reduces bank access, improves access efficiency, and enhances overall throughput, resulting in a 6.3 times speedup and 58.9% energy savings compared to the baseline.
- [953] arXiv:2402.04046 (cross-list from cs.SI) [ pdf , ps , other ]
-
Title: Generative Modeling of Graphs via Joint Diffusion of Node and Edge AttributesSubjects: Social and Information Networks (cs.SI) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: Graph generation is integral to various engineering and scientific disciplines. Nevertheless, existing methodologies tend to overlook the generation of edge attributes. However, we identify critical applications where edge attributes are essential, making prior methods potentially unsuitable in such contexts. Moreover, while trivial adaptations are available, empirical investigations reveal their limited efficacy as they do not properly model the interplay among graph components. To address this, we propose a joint score-based model of nodes and edges for graph generation that considers all graph components. Our approach offers two key novelties: (i) node and edge attributes are combined in an attention module that generates samples based on the two ingredients; and (ii) node, edge and adjacency information are mutually dependent during the graph diffusion process. We evaluate our method on challenging benchmarks involving real-world and synthetic datasets in which edge features are crucial. Additionally, we introduce a new synthetic dataset that incorporates edge values. Furthermore, we propose a novel application that greatly benefits from the method due to its nature: the generation of traffic scenes represented as graphs. Our method outperforms other graph generation methods, demonstrating a significant advantage in edge-related measures.
- [954] arXiv:2402.04049 (cross-list from cs.CL) [ pdf , ps , other ]
-
Title: Systematic Biases in LLM Simulations of DebatesSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI)
Abstract: Recent advancements in natural language processing, especially the emergence of Large Language Models (LLMs), have opened exciting possibilities for constructing computational simulations designed to replicate human behavior accurately. However, LLMs are complex statistical learners without straightforward deductive rules, making them prone to unexpected behaviors. In this study, we highlight the limitations of LLMs in simulating human interactions, particularly focusing on LLMs' ability to simulate political debates. Our findings indicate a tendency for LLM agents to conform to the model's inherent social biases despite being directed to debate from certain political perspectives. This tendency results in behavioral patterns that seem to deviate from well-established social dynamics among humans. We reinforce these observations using an automatic self-fine-tuning method, which enables us to manipulate the biases within the LLM and demonstrate that agents subsequently align with the altered biases. These results underscore the need for further research to develop methods that help agents overcome these biases, a critical step toward creating more realistic simulations.
- [955] arXiv:2402.04050 (cross-list from cs.LG) [ pdf , ps , other ]
-
Title: Connecting the Dots: Collaborative Fine-tuning for Black-Box Vision-Language ModelsSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
Abstract: With the emergence of pretrained vision-language models (VLMs), considerable efforts have been devoted to fine-tuning them for downstream tasks. Despite the progress made in designing efficient fine-tuning methods, such methods require access to the model's parameters, which can be challenging as model owners often opt to provide their models as a black box to safeguard model ownership. This paper proposes a \textbf{C}ollabo\textbf{ra}tive \textbf{F}ine-\textbf{T}uning (\textbf{CraFT}) approach for fine-tuning black-box VLMs to downstream tasks, where one only has access to the input prompts and the output predictions of the model. CraFT comprises two modules, a prompt generation module for learning text prompts and a prediction refinement module for enhancing output predictions in residual style. Additionally, we introduce an auxiliary prediction-consistent loss to promote consistent optimization across these modules. These modules are optimized by a novel collaborative training algorithm. Extensive experiments on few-shot classification over 15 datasets demonstrate the superiority of CraFT. The results show that CraFT achieves a decent gain of about 12\% with 16-shot datasets and only 8,000 queries. Moreover, CraFT trains faster and uses only about 1/80 of the memory footprint for deployment, while sacrificing only 1.62\% compared to the white-box method.
- [956] arXiv:2402.04059 (cross-list from cs.LG) [ pdf , ps , html , other ]
-
Title: Deep Learning for Multivariate Time Series Imputation: A SurveyComments: 9 pages, 1 figure, 5 tables, 58 referred papersSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Abstract: The ubiquitous missing values cause the multivariate time series data to be partially observed, destroying the integrity of time series and hindering the effective time series data analysis. Recently deep learning imputation methods have demonstrated remarkable success in elevating the quality of corrupted time series data, subsequently enhancing performance in downstream tasks. In this paper, we conduct a comprehensive survey on the recently proposed deep learning imputation methods. First, we propose a taxonomy for the reviewed methods, and then provide a structured review of these methods by highlighting their strengths and limitations. We also conduct empirical experiments to study different methods and compare their enhancement for downstream tasks. Finally, the open issues for future research on multivariate time series imputation are pointed out. All code and configurations of this work, including a regularly maintained multivariate time series imputation paper list, can be found in the GitHub repository~\url{ this https URL \_Imputation}.
- [957] arXiv:2402.04062 (cross-list from cs.LG) [ pdf , ps , other ]
-
Title: Link Prediction with Relational HypergraphsSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Abstract: Link prediction with knowledge graphs has been thoroughly studied in graph machine learning, leading to a rich landscape of graph neural network architectures with successful applications. Nonetheless, it remains challenging to transfer the success of these architectures to link prediction with relational hypergraphs. The presence of relational hyperedges makes link prediction a task between $k$ nodes for varying choices of $k$, which is substantially harder than link prediction with knowledge graphs, where every relation is binary ($k=2$). In this paper, we propose two frameworks for link prediction with relational hypergraphs and conduct a thorough analysis of the expressive power of the resulting model architectures via corresponding relational Weisfeiler-Leman algorithms, and also via some natural logical formalisms. Through extensive empirical analysis, we validate the power of the proposed model architectures on various relational hypergraph benchmarks. The resulting model architectures substantially outperform every baseline for inductive link prediction, and lead to state-of-the-art results for transductive link prediction. Our study therefore unlocks applications of graph neural networks to fully relational structures.
- [958] arXiv:2402.04064 (cross-list from cs.CV) [ pdf , ps , html , other ]
-
Title: Multi-class Road Defect Detection and Segmentation using Spatial and Channel-wise Attention for Autonomous Road RepairingComments: Accepted to the ICRA 2024Subjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI)
Abstract: Road pavement detection and segmentation are critical for developing autonomous road repair systems. However, developing an instance segmentation method that simultaneously performs multi-class defect detection and segmentation is challenging due to the textural simplicity of road pavement image, the diversity of defect geometries, and the morphological ambiguity between classes. We propose a novel end-to-end method for multi-class road defect detection and segmentation. The proposed method comprises multiple spatial and channel-wise attention blocks available to learn global representations across spatial and channel-wise dimensions. Through these attention blocks, more globally generalised representations of morphological information (spatial characteristics) of road defects and colour and depth information of images can be learned. To demonstrate the effectiveness of our framework, we conducted various ablation studies and comparisons with prior methods on a newly collected dataset annotated with nine road defect classes. The experiments show that our proposed method outperforms existing state-of-the-art methods for multi-class road defect detection and segmentation methods.
- [959] arXiv:2402.04081 (cross-list from cs.LG) [ pdf , ps , other ]
-
Title: Improved Generalization of Weight Space Networks via AugmentationsComments: Under ReviewSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Abstract: Learning in deep weight spaces (DWS), where neural networks process the weights of other neural networks, is an emerging research direction, with applications to 2D and 3D neural fields (INRs, NeRFs), as well as making inferences about other types of neural networks. Unfortunately, weight space models tend to suffer from substantial overfitting. We empirically analyze the reasons for this overfitting and find that a key reason is the lack of diversity in DWS datasets. While a given object can be represented by many different weight configurations, typical INR training sets fail to capture variability across INRs that represent the same object. To address this, we explore strategies for data augmentation in weight spaces and propose a MixUp method adapted for weight spaces. We demonstrate the effectiveness of these methods in two setups. In classification, they improve performance similarly to having up to 10 times more data. In self-supervised contrastive learning, they yield substantial 5-10% gains in downstream classification.
- [960] arXiv:2402.04082 (cross-list from cs.LG) [ pdf , ps , other ]
-
Title: An Optimal House Price Prediction Algorithm: XGBoostComments: 16 pages, Journal of AnalyticsJournal-ref: Analytics, 3(1), 30-45 (2024)Subjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Applications (stat.AP); Methodology (stat.ME)
Abstract: An accurate prediction of house prices is a fundamental requirement for various sectors including real estate and mortgage lending. It is widely recognized that a property value is not solely determined by its physical attributes but is significantly influenced by its surrounding neighbourhood. Meeting the diverse housing needs of individuals while balancing budget constraints is a primary concern for real estate developers. To this end, we addressed the house price prediction problem as a regression task and thus employed various machine learning techniques capable of expressing the significance of independent variables. We made use of the housing dataset of Ames City in Iowa, USA to compare support vector regressor, random forest regressor, XGBoost, multilayer perceptron and multiple linear regression algorithms for house price prediction. Afterwards, we identified the key factors that influence housing costs. Our results show that XGBoost is the best performing model for house price prediction.
- [961] arXiv:2402.04087 (cross-list from cs.CV) [ pdf , ps , other ]
-
Title: A Hard-to-Beat Baseline for Training-free CLIP-based AdaptationComments: Accepted by ICLR 2024Subjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: Contrastive Language-Image Pretraining (CLIP) has gained popularity for its remarkable zero-shot capacity. Recent research has focused on developing efficient fine-tuning methods, such as prompt learning and adapter, to enhance CLIP's performance in downstream tasks. However, these methods still require additional training time and computational resources, which is undesirable for devices with limited resources. In this paper, we revisit a classical algorithm, Gaussian Discriminant Analysis (GDA), and apply it to the downstream classification of CLIP. Typically, GDA assumes that features of each class follow Gaussian distributions with identical covariance. By leveraging Bayes' formula, the classifier can be expressed in terms of the class means and covariance, which can be estimated from the data without the need for training. To integrate knowledge from both visual and textual modalities, we ensemble it with the original zero-shot classifier within CLIP. Extensive results on 17 datasets validate that our method surpasses or achieves comparable results with state-of-the-art methods on few-shot classification, imbalanced learning, and out-of-distribution generalization. In addition, we extend our method to base-to-new generalization and unsupervised learning, once again demonstrating its superiority over competing approaches. Our code is publicly available at \url{ this https URL }.
- [962] arXiv:2402.04088 (cross-list from cs.CL) [ pdf , ps , other ]
-
Title: The Use of a Large Language Model for Cyberbullying DetectionComments: 14 pages, Journal of AnalyticsJournal-ref: Analytics 2 (2023), no. 3: 694-707Subjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Applications (stat.AP)
Abstract: The dominance of social media has added to the channels of bullying for perpetrators. Unfortunately, cyberbullying (CB) is the most prevalent phenomenon in todays cyber world, and is a severe threat to the mental and physical health of citizens. This opens the need to develop a robust system to prevent bullying content from online forums, blogs, and social media platforms to manage the impact in our society. Several machine learning (ML) algorithms have been proposed for this purpose. However, their performances are not consistent due to high class imbalance and generalisation issues. In recent years, large language models (LLMs) like BERT and RoBERTa have achieved state-of-the-art (SOTA) results in several natural language processing (NLP) tasks. Unfortunately, the LLMs have not been applied extensively for CB detection. In our paper, we explored the use of these models for cyberbullying (CB) detection. We have prepared a new dataset (D2) from existing studies (Formspring and Twitter). Our experimental results for dataset D1 and D2 showed that RoBERTa outperformed other models.
- [963] arXiv:2402.04102 (cross-list from cs.CR) [ pdf , ps , other ]
-
Title: Use of Multi-CNNs for Section Analysis in Static Malware DetectionComments: arXiv admin note: text overlap with arXiv:2312.12161Subjects: Cryptography and Security (cs.CR) ; Artificial Intelligence (cs.AI)
Abstract: Existing research on malware detection focuses almost exclusively on the detection rate. However, in some cases, it is also important to understand the results of our algorithm, or to obtain more information, such as where to investigate in the file for an analyst. In this aim, we propose a new model to analyze Portable Executable files. Our method consists in splitting the files in different sections, then transform each section into an image, in order to train convolutional neural networks to treat specifically each identified section. Then we use all these scores returned by CNNs to compute a final detection score, using models that enable us to improve our analysis of the importance of each section in the final score.
- [964] arXiv:2402.04103 (cross-list from cs.LG) [ pdf , ps , other ]
-
Title: An Exploration of Clustering Algorithms for Customer Segmentation in the UK Retail MarketComments: 15 pages, Journal of AnalyticsJournal-ref: Analytics, 2(4), 809-823 (2023)Subjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Applications (stat.AP); Computation (stat.CO)
Abstract: Recently, peoples awareness of online purchases has significantly risen. This has given rise to online retail platforms and the need for a better understanding of customer purchasing behaviour. Retail companies are pressed with the need to deal with a high volume of customer purchases, which requires sophisticated approaches to perform more accurate and efficient customer segmentation. Customer segmentation is a marketing analytical tool that aids customer-centric service and thus enhances profitability. In this paper, we aim to develop a customer segmentation model to improve decision-making processes in the retail market industry. To achieve this, we employed a UK-based online retail dataset obtained from the UCI machine learning repository. The retail dataset consists of 541,909 customer records and eight features. Our study adopted the RFM (recency, frequency, and monetary) framework to quantify customer values. Thereafter, we compared several state-of-the-art (SOTA) clustering algorithms, namely, K-means clustering, the Gaussian mixture model (GMM), density-based spatial clustering of applications with noise (DBSCAN), agglomerative clustering, and balanced iterative reducing and clustering using hierarchies (BIRCH). The results showed the GMM outperformed other approaches, with a Silhouette Score of 0.80.
- [965] arXiv:2402.04108 (cross-list from cs.LG) [ pdf , ps , other ]
-
Title: Hierarchical Delay Attribution Classification using Unstructured Text in Train Management SystemsComments: 22 pages, 7 figuresSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Abstract: EU directives stipulate a systematic follow-up of train delays. In Sweden, the Swedish Transport Administration registers and assigns an appropriate delay attribution code. However, this delay attribution code is assigned manually, which is a complex task. In this paper, a machine learning-based decision support for assigning delay attribution codes based on event descriptions is investigated. The text is transformed using TF-IDF, and two models, Random Forest and Support Vector Machine, are evaluated against a random uniform classifier and the classification performance of the Swedish Transport Administration. Further, the problem is modeled as both a hierarchical and flat approach. The results indicate that a hierarchical approach performs better than a flat approach. Both approaches perform better than the random uniform classifier but perform worse than the manual classification.
- [966] arXiv:2402.04141 (cross-list from cs.SE) [ pdf , ps , other ]
-
Title: Multi-line AI-assisted Code AuthoringOmer Dunay , Daniel Cheng , Adam Tait , Parth Thakkar , Peter C Rigby , Andy Chiu , Imad Ahmad , Arun Ganesan , Chandra Maddila , Vijayaraghavan Murali , Ali Tayyebi , Nachiappan NagappanSubjects: Software Engineering (cs.SE) ; Artificial Intelligence (cs.AI)
Abstract: CodeCompose is an AI-assisted code authoring tool powered by large language models (LLMs) that provides inline suggestions to 10's of thousands of developers at Meta. In this paper, we present how we scaled the product from displaying single-line suggestions to multi-line suggestions. This evolution required us to overcome several unique challenges in improving the usability of these suggestions for developers.
First, we discuss how multi-line suggestions can have a 'jarring' effect, as the LLM's suggestions constantly move around the developer's existing code, which would otherwise result in decreased productivity and satisfaction.
Second, multi-line suggestions take significantly longer to generate; hence we present several innovative investments we made to reduce the perceived latency for users. These model-hosting optimizations sped up multi-line suggestion latency by 2.5x.
Finally, we conduct experiments on 10's of thousands of engineers to understand how multi-line suggestions impact the user experience and contrast this with single-line suggestions. Our experiments reveal that (i) multi-line suggestions account for 42% of total characters accepted (despite only accounting for 16% for displayed suggestions) (ii) multi-line suggestions almost doubled the percentage of keystrokes saved for users from 9% to 17%. Multi-line CodeCompose has been rolled out to all engineers at Meta, and less than 1% of engineers have opted out of multi-line suggestions. - [967] arXiv:2402.04173 (cross-list from cs.CR) [ pdf , ps , other ]
-
Title: COPS: A Compact On-device Pipeline for real-time Smishing detectionComments: Published at IEEE Consumer Communications & Networking Conference (CCNC) 2024Subjects: Cryptography and Security (cs.CR) ; Artificial Intelligence (cs.AI)
Abstract: Smartphones have become indispensable in our daily lives and can do almost everything, from communication to online shopping. However, with the increased usage, cybercrime aimed at mobile devices is rocketing. Smishing attacks, in particular, have observed a significant upsurge in recent years. This problem is further exacerbated by the perpetrator creating new deceptive websites daily, with an average life cycle of under 15 hours. This renders the standard practice of keeping a database of malicious URLs ineffective. To this end, we propose a novel on-device pipeline: COPS that intelligently identifies features of fraudulent messages and URLs to alert the user in real-time. COPS is a lightweight pipeline with a detection module based on the Disentangled Variational Autoencoder of size 3.46MB for smishing and URL phishing detection, and we benchmark it on open datasets. We achieve an accuracy of 98.15% and 99.5%, respectively, for both tasks, with a false negative and false positive rate of a mere 0.037 and 0.015, outperforming previous works with the added advantage of ensuring real-time alerts on resource-constrained devices.
- [968] arXiv:2402.04209 (cross-list from cs.LG) [ pdf , ps , other ]
-
Title: Acute kidney injury prediction for non-critical care patients: a retrospective external and internal validation studyEsra Adiyeke , Yuanfang Ren , Benjamin Shickel , Matthew M. Ruppert , Ziyuan Guan , Sandra L. Kane-Gill , Raghavan Murugan , Nabihah Amatullah , Britney A. Stottlemyer , Tiffany L. Tran , Dan Ricketts , Christopher M Horvat , Parisa Rashidi , Azra Bihorac , Tezcan Ozrazgat-BaslantiSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Abstract: Background: Acute kidney injury (AKI), the decline of kidney excretory function, occurs in up to 18% of hospitalized admissions. Progression of AKI may lead to irreversible kidney damage. Methods: This retrospective cohort study includes adult patients admitted to a non-intensive care unit at the University of Pittsburgh Medical Center (UPMC) (n = 46,815) and University of Florida Health (UFH) (n = 127,202). We developed and compared deep learning and conventional machine learning models to predict progression to Stage 2 or higher AKI within the next 48 hours. We trained local models for each site (UFH Model trained on UFH, UPMC Model trained on UPMC) and a separate model with a development cohort of patients from both sites (UFH-UPMC Model). We internally and externally validated the models on each site and performed subgroup analyses across sex and race. Results: Stage 2 or higher AKI occurred in 3% (n=3,257) and 8% (n=2,296) of UFH and UPMC patients, respectively. Area under the receiver operating curve values (AUROC) for the UFH test cohort ranged between 0.77 (UPMC Model) and 0.81 (UFH Model), while AUROC values ranged between 0.79 (UFH Model) and 0.83 (UPMC Model) for the UPMC test cohort. UFH-UPMC Model achieved an AUROC of 0.81 (95% confidence interval [CI] [0.80, 0.83]) for UFH and 0.82 (95% CI [0.81,0.84]) for UPMC test cohorts; an area under the precision recall curve values (AUPRC) of 0.6 (95% CI, [0.05, 0.06]) for UFH and 0.13 (95% CI, [0.11,0.15]) for UPMC test cohorts. Kinetic estimated glomerular filtration rate, nephrotoxic drug burden and blood urea nitrogen remained the top three features with the highest influence across the models and health centers. Conclusion: Locally developed models displayed marginally reduced discrimination when tested on another institution, while the top set of influencing features remained the same across the models and sites.
- [969] arXiv:2402.04228 (cross-list from cs.RO) [ pdf , ps , other ]
-
Title: Intelligent Collective Escape of Swarm Robots Based on a Novel Fish-inspired Self-adaptive Approach with Neurodynamic ModelsComments: This article is accepted for publication in a future issue of IEEE Transactions on Industrial ElectronicsSubjects: Robotics (cs.RO) ; Artificial Intelligence (cs.AI); Systems and Control (eess.SY)
Abstract: Fish schools present high-efficiency group behaviors through simple individual interactions to collective migration and dynamic escape from the predator. The school behavior of fish is usually a good inspiration to design control architecture for swarm robots. In this paper, a novel fish-inspired self-adaptive approach is proposed for collective escape for the swarm robots. In addition, a bio-inspired neural network (BINN) is introduced to generate collision-free escape robot trajectories through the combination of attractive and repulsive forces. Furthermore, to cope with dynamic environments, a neurodynamics-based self-adaptive mechanism is proposed to improve the self-adaptive performance of the swarm robots in the changing environment. Similar to fish escape maneuvers, simulation and experimental results show that the swarm robots are capable of collectively leaving away from the threats. Several comparison studies demonstrated that the proposed approach can significantly improve the effectiveness and efficiency of system performance, and the flexibility and robustness in complex environments.
- [970] arXiv:2402.04247 (cross-list from cs.CY) [ pdf , ps , other ]
-
Title: Prioritizing Safeguarding Over Autonomy: Risks of LLM Agents for ScienceXiangru Tang , Qiao Jin , Kunlun Zhu , Tongxin Yuan , Yichi Zhang , Wangchunshu Zhou , Meng Qu , Yilun Zhao , Jian Tang , Zhuosheng Zhang , Arman Cohan , Zhiyong Lu , Mark GersteinSubjects: Computers and Society (cs.CY) ; Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG)
Abstract: Intelligent agents powered by large language models (LLMs) have demonstrated substantial promise in autonomously conducting experiments and facilitating scientific discoveries across various disciplines. While their capabilities are promising, they also introduce novel vulnerabilities that demand careful consideration for safety. However, there exists a notable gap in the literature, as there has been no comprehensive exploration of these vulnerabilities. This position paper fills this gap by conducting a thorough examination of vulnerabilities in LLM-based agents within scientific domains, shedding light on potential risks associated with their misuse and emphasizing the need for safety measures. We begin by providing a comprehensive overview of the potential risks inherent to scientific LLM agents, taking into account user intent, the specific scientific domain, and their potential impact on the external environment. Then, we delve into the origins of these vulnerabilities and provide a scoping review of the limited existing works. Based on our analysis, we propose a triadic framework involving human regulation, agent alignment, and an understanding of environmental feedback (agent regulation) to mitigate these identified risks. Furthermore, we highlight the limitations and challenges associated with safeguarding scientific agents and advocate for the development of improved models, robust benchmarks, and comprehensive regulations to address these issues effectively.
- [971] arXiv:2402.04249 (cross-list from cs.LG) [ pdf , ps , html , other ]
-
Title: HarmBench: A Standardized Evaluation Framework for Automated Red Teaming and Robust RefusalMantas Mazeika , Long Phan , Xuwang Yin , Andy Zou , Zifan Wang , Norman Mu , Elham Sakhaee , Nathaniel Li , Steven Basart , Bo Li , David Forsyth , Dan HendrycksComments: Website: this https URLSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV)
Abstract: Automated red teaming holds substantial promise for uncovering and mitigating the risks associated with the malicious use of large language models (LLMs), yet the field lacks a standardized evaluation framework to rigorously assess new methods. To address this issue, we introduce HarmBench, a standardized evaluation framework for automated red teaming. We identify several desirable properties previously unaccounted for in red teaming evaluations and systematically design HarmBench to meet these criteria. Using HarmBench, we conduct a large-scale comparison of 18 red teaming methods and 33 target LLMs and defenses, yielding novel insights. We also introduce a highly efficient adversarial training method that greatly enhances LLM robustness across a wide range of attacks, demonstrating how HarmBench enables codevelopment of attacks and defenses. We open source HarmBench at this https URL .
- [972] arXiv:2402.04267 (cross-list from physics.med-ph) [ pdf , ps , other ]
-
Title: Application analysis of ai technology combined with spiral CT scanning in early lung cancer screeningComments: This article was accepted by Frontiers in Computing and Intelligent Systems this https URL . arXiv admin note: text overlap with arXiv:nlin/0508031 by other authorsSubjects: Medical Physics (physics.med-ph) ; Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV)
Abstract: At present, the incidence and fatality rate of lung cancer in China rank first among all malignant tumors. Despite the continuous development and improvement of China's medical level, the overall 5-year survival rate of lung cancer patients is still lower than 20% and is staged. A number of studies have confirmed that early diagnosis and treatment of early stage lung cancer is of great significance to improve the prognosis of patients. In recent years, artificial intelligence technology has gradually begun to be applied in oncology. ai is used in cancer screening, clinical diagnosis, radiation therapy (image acquisition, at-risk organ segmentation, image calibration and delivery) and other aspects of rapid development. However, whether medical ai can be socialized depends on the public's attitude and acceptance to a certain extent. However, at present, there are few studies on the diagnosis of early lung cancer by AI technology combined with SCT scanning. In view of this, this study applied the combined method in early lung cancer screening, aiming to find a safe and efficient screening mode and provide a reference for clinical diagnosis and treatment.
- [973] arXiv:2402.04268 (cross-list from cond-mat.soft) [ pdf , ps , html , other ]
-
Title: ProtAgents: Protein discovery via large language model multi-agent collaborations combining physics and machine learningSubjects: Soft Condensed Matter (cond-mat.soft) ; Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Biomolecules (q-bio.BM)
Abstract: Designing de novo proteins beyond those found in nature holds significant promise for advancements in both scientific and engineering applications. Current methodologies for protein design often rely on AI-based models, such as surrogate models that address end-to-end problems by linking protein structure to material properties or vice versa. However, these models frequently focus on specific material objectives or structural properties, limiting their flexibility when incorporating out-of-domain knowledge into the design process or comprehensive data analysis is required. In this study, we introduce ProtAgents, a platform for de novo protein design based on Large Language Models (LLMs), where multiple AI agents with distinct capabilities collaboratively address complex tasks within a dynamic environment. The versatility in agent development allows for expertise in diverse domains, including knowledge retrieval, protein structure analysis, physics-based simulations, and results analysis. The dynamic collaboration between agents, empowered by LLMs, provides a versatile approach to tackling protein design and analysis problems, as demonstrated through diverse examples in this study. The problems of interest encompass designing new proteins, analyzing protein structures and obtaining new first-principles data -- natural vibrational frequencies -- via physics simulations. The concerted effort of the system allows for powerful automated and synergistic design of de novo proteins with targeted mechanical properties. The flexibility in designing the agents, on one hand, and their capacity in autonomous collaboration through the dynamic LLM-based multi-agent environment on the other hand, unleashes great potentials of LLMs in addressing multi-objective materials problems and opens up new avenues for autonomous materials discovery and design.
- [974] arXiv:2402.04275 (cross-list from q-bio.NC) [ pdf , ps , other ]
-
Title: Motion Mapping Cognition: A Nondecomposable Primary Process in Human VisionComments: 7 pages, 3 figuresSubjects: Neurons and Cognition (q-bio.NC) ; Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
Abstract: Human intelligence seems so mysterious that we have not successfully understood its foundation until now. Here, I want to present a basic cognitive process, motion mapping cognition (MMC), which should be a nondecomposable primary function in human vision. Wherein, I point out that, MMC process can be used to explain most of human visual functions in fundamental, but can not be effectively modelled by traditional visual processing ways including image segmentation, object recognition, object tracking etc. Furthermore, I state that MMC may be looked as an extension of Chen's theory of topological perception on human vision, and seems to be unsolvable using existing intelligent algorithm skills. Finally, along with the requirements of MMC problem, an interesting computational model, quantized topological matching principle can be derived by developing the idea of optimal transport theory. Above results may give us huge inspiration to develop more robust and interpretable machine vision models.
- [975] arXiv:2402.04286 (cross-list from q-bio.QM) [ pdf , ps , other ]
-
Title: Progress and Opportunities of Foundation Models in BioinformaticsComments: 27 pages, 3 figures, 2 tablesSubjects: Quantitative Methods (q-bio.QM) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: Bioinformatics has witnessed a paradigm shift with the increasing integration of artificial intelligence (AI), particularly through the adoption of foundation models (FMs). These AI techniques have rapidly advanced, addressing historical challenges in bioinformatics such as the scarcity of annotated data and the presence of data noise. FMs are particularly adept at handling large-scale, unlabeled data, a common scenario in biological contexts due to the time-consuming and costly nature of experimentally determining labeled data. This characteristic has allowed FMs to excel and achieve notable results in various downstream validation tasks, demonstrating their ability to represent diverse biological entities effectively. Undoubtedly, FMs have ushered in a new era in computational biology, especially in the realm of deep learning. The primary goal of this survey is to conduct a systematic investigation and summary of FMs in bioinformatics, tracing their evolution, current research status, and the methodologies employed. Central to our focus is the application of FMs to specific biological problems, aiming to guide the research community in choosing appropriate FMs for their research needs. We delve into the specifics of the problem at hand including sequence analysis, structure prediction, function annotation, and multimodal integration, comparing the structures and advancements against traditional methods. Furthermore, the review analyses challenges and limitations faced by FMs in biology, such as data noise, model explainability, and potential biases. Finally, we outline potential development paths and strategies for FMs in future biological research, setting the stage for continued innovation and application in this rapidly evolving field. This comprehensive review serves not only as an academic resource but also as a roadmap for future explorations and applications of FMs in biology.
- [976] arXiv:2402.04290 (cross-list from cs.LG) [ pdf , ps , html , other ]
-
Title: CasCast: Skillful High-resolution Precipitation Nowcasting via Cascaded ModellingSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Abstract: Precipitation nowcasting based on radar data plays a crucial role in extreme weather prediction and has broad implications for disaster management. Despite progresses have been made based on deep learning, two key challenges of precipitation nowcasting are not well-solved: (i) the modeling of complex precipitation system evolutions with different scales, and (ii) accurate forecasts for extreme precipitation. In this work, we propose CasCast, a cascaded framework composed of a deterministic and a probabilistic part to decouple the predictions for mesoscale precipitation distributions and small-scale patterns. Then, we explore training the cascaded framework at the high resolution and conducting the probabilistic modeling in a low dimensional latent space with a frame-wise-guided diffusion transformer for enhancing the optimization of extreme events while reducing computational costs. Extensive experiments on three benchmark radar precipitation datasets show that CasCast achieves competitive performance. Especially, CasCast significantly surpasses the baseline (up to +91.8%) for regional extreme-precipitation nowcasting.
- [977] arXiv:2402.04291 (cross-list from cs.LG) [ pdf , ps , html , other ]
-
Title: BiLLM: Pushing the Limit of Post-Training Quantization for LLMsWei Huang , Yangdong Liu , Haotong Qin , Ying Li , Shiming Zhang , Xianglong Liu , Michele Magno , Xiaojuan QiComments: 19 pagesSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
Abstract: Pretrained large language models (LLMs) exhibit exceptional general language processing capabilities but come with significant demands on memory and computational resources. As a powerful compression technology, binarization can extremely reduce model weights to a mere 1 bit, lowering the expensive computation and memory requirements. However, existing quantization techniques fall short of maintaining LLM performance under ultra-low bit-widths. In response to this challenge, we present BiLLM, a groundbreaking 1-bit post-training quantization scheme tailored for pretrained LLMs. Based on the weight distribution of LLMs, BiLLM first identifies and structurally selects salient weights, and minimizes the compression loss through an effective binary residual approximation strategy. Moreover, considering the bell-shaped distribution of the non-salient weights, we propose an optimal splitting search to group and binarize them accurately. BiLLM achieving for the first time high-accuracy inference (e.g. 8.41 perplexity on LLaMA2-70B) with only 1.08-bit weights across various LLMs families and evaluation metrics, outperforms SOTA quantization methods of LLM by significant margins. Moreover, BiLLM enables the binarization process of the LLM with 7 billion weights within 0.5 hours on a single GPU, demonstrating satisfactory time efficiency.
- [978] arXiv:2402.04292 (cross-list from cs.LG) [ pdf , ps , other ]
-
Title: AdaFlow: Imitation Learning with Variance-Adaptive Flow-Based PoliciesComments: 18 pagesSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Abstract: Diffusion-based imitation learning improves Behavioral Cloning (BC) on multi-modal decision-making, but comes at the cost of significantly slower inference due to the recursion in the diffusion process. It urges us to design efficient policy generators while keeping the ability to generate diverse actions. To address this challenge, we propose AdaFlow, an imitation learning framework based on flow-based generative modeling. AdaFlow represents the policy with state-conditioned ordinary differential equations (ODEs), which are known as probability flows. We reveal an intriguing connection between the conditional variance of their training loss and the discretization error of the ODEs. With this insight, we propose a variance-adaptive ODE solver that can adjust its step size in the inference stage, making AdaFlow an adaptive decision-maker, offering rapid inference without sacrificing diversity. Interestingly, it automatically reduces to a one-step generator when the action distribution is uni-modal. Our comprehensive empirical evaluation shows that AdaFlow achieves high performance across all dimensions, including success rate, behavioral diversity, and inference speed. The code is available at this https URL
- [979] arXiv:2402.04325 (cross-list from cs.LG) [ pdf , ps , html , other ]
-
Title: Enhance DNN Adversarial Robustness and Efficiency via Injecting Noise to Non-Essential NeuronsSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Cryptography and Security (cs.CR)
Abstract: Deep Neural Networks (DNNs) have revolutionized a wide range of industries, from healthcare and finance to automotive, by offering unparalleled capabilities in data analysis and decision-making. Despite their transforming impact, DNNs face two critical challenges: the vulnerability to adversarial attacks and the increasing computational costs associated with more complex and larger models. In this paper, we introduce an effective method designed to simultaneously enhance adversarial robustness and execution efficiency. Unlike prior studies that enhance robustness via uniformly injecting noise, we introduce a non-uniform noise injection algorithm, strategically applied at each DNN layer to disrupt adversarial perturbations introduced in attacks. By employing approximation techniques, our approach identifies and protects essential neurons while strategically introducing noise into non-essential neurons. Our experimental results demonstrate that our method successfully enhances both robustness and efficiency across several attack scenarios, model architectures, and datasets.
- [980] arXiv:2402.04333 (cross-list from cs.CL) [ pdf , ps , html , other ]
-
Title: LESS: Selecting Influential Data for Targeted Instruction TuningComments: Code and data are available at this https URLSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: Instruction tuning has unlocked powerful capabilities in large language models (LLMs), effectively using combined datasets to develop generalpurpose chatbots. However, real-world applications often require a specialized suite of skills (e.g., reasoning). The challenge lies in identifying the most relevant data from these extensive datasets to effectively develop specific capabilities, a setting we frame as targeted instruction tuning. We propose LESS, an optimizer-aware and practically efficient algorithm to effectively estimate data influences and perform Low-rank gradiEnt Similarity Search for instruction data selection. Crucially, LESS adapts existing influence formulations to work with the Adam optimizer and variable-length instruction data. LESS first constructs a highly reusable and transferable gradient datastore with low-dimensional gradient features and then selects examples based on their similarity to few-shot examples embodying a specific capability. Experiments show that training on a LESS-selected 5% of the data can often outperform training on the full dataset across diverse downstream tasks. Furthermore, the selected data is highly transferable: smaller models can be leveraged to select useful data for larger models and models from different families. Our qualitative analysis shows that our method goes beyond surface form cues to identify data that exemplifies the necessary reasoning skills for the intended downstream application.
- [981] arXiv:2402.04335 (cross-list from cs.CL) [ pdf , ps , html , other ]
-
Title: LegalLens: Leveraging LLMs for Legal Violation Identification in Unstructured TextDor Bernsohn , Gil Semo , Yaron Vazana , Gila Hayat , Ben Hagag , Joel Niklaus , Rohit Saha , Kyryl TruskovskyiSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: In this study, we focus on two main tasks, the first for detecting legal violations within unstructured textual data, and the second for associating these violations with potentially affected individuals. We constructed two datasets using Large Language Models (LLMs) which were subsequently validated by domain expert annotators. Both tasks were designed specifically for the context of class-action cases. The experimental design incorporated fine-tuning models from the BERT family and open-source LLMs, and conducting few-shot experiments using closed-source LLMs. Our results, with an F1-score of 62.69\% (violation identification) and 81.02\% (associating victims), show that our datasets and setups can be used for both tasks. Finally, we publicly release the datasets and the code used for the experiments in order to advance further research in the area of legal natural language processing (NLP).
- [982] arXiv:2402.04355 (cross-list from stat.ML) [ pdf , ps , html , other ]
-
Title: PQMass: Probabilistic Assessment of the Quality of Generative Models using Probability Mass EstimationComments: 14 pages, 13 figuresSubjects: Machine Learning (stat.ML) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Methodology (stat.ME)
Abstract: We propose a comprehensive sample-based method for assessing the quality of generative models. The proposed approach enables the estimation of the probability that two sets of samples are drawn from the same distribution, providing a statistically rigorous method for assessing the performance of a single generative model or the comparison of multiple competing models trained on the same dataset. This comparison can be conducted by dividing the space into non-overlapping regions and comparing the number of data samples in each region. The method only requires samples from the generative model and the test data. It is capable of functioning directly on high-dimensional data, obviating the need for dimensionality reduction. Significantly, the proposed method does not depend on assumptions regarding the density of the true distribution, and it does not rely on training or fitting any auxiliary models. Instead, it focuses on approximating the integral of the density (probability mass) across various sub-regions within the data space.
- [983] arXiv:2402.04376 (cross-list from cs.LG) [ pdf , ps , html , other ]
-
Title: Scaling laws for learning with real and surrogate dataSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
Abstract: Collecting large quantities of high-quality data is often prohibitively expensive or impractical, and a crucial bottleneck in machine learning. One may instead augment a small set of $n$ data points from the target distribution with data from more accessible sources like public datasets, data collected under different circumstances, or synthesized by generative models. Blurring distinctions, we refer to such data as `surrogate data'.
We define a simple scheme for integrating surrogate data into training and use both theoretical models and empirical studies to explore its behavior. Our main findings are: $(i)$ Integrating surrogate data can significantly reduce the test error on the original distribution; $(ii)$ In order to reap this benefit, it is crucial to use optimally weighted empirical risk minimization; $(iii)$ The test error of models trained on mixtures of real and surrogate data is well described by a scaling law. This can be used to predict the optimal weighting and the gain from surrogate data. - [984] arXiv:2402.04390 (cross-list from cs.LG) [ pdf , ps , other ]
-
Title: Densely Multiplied Physics Informed Neural NetworksComments: 15 pages, 9 figuresSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Abstract: Although physics-informed neural networks (PINNs) have shown great potential in dealing with nonlinear partial differential equations (PDEs), it is common that PINNs will suffer from the problem of insufficient precision or obtaining incorrect outcomes. Unlike most of the existing solutions trying to enhance the ability of PINN by optimizing the training process, this paper improved the neural network architecture to improve the performance of PINN. We propose a densely multiply PINN (DM-PINN) architecture, which multiplies the output of a hidden layer with the outputs of all the behind hidden layers. Without introducing more trainable parameters, this effective mechanism can significantly improve the accuracy of PINNs. The proposed architecture is evaluated on four benchmark examples (Allan-Cahn equation, Helmholtz equation, Burgers equation and 1D convection equation). Comparisons between the proposed architecture and different PINN structures demonstrate the superior performance of the DM-PINN in both accuracy and efficiency.
- [985] arXiv:2402.04396 (cross-list from cs.LG) [ pdf , ps , html , other ]
-
Title: QuIP#: Even Better LLM Quantization with Hadamard Incoherence and Lattice CodebooksComments: PreprintSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
Abstract: Post-training quantization (PTQ) reduces the memory footprint of LLMs by quantizing their weights to low-precision. In this work, we introduce QuIP#, a weight-only PTQ method that achieves state-of-the-art results in extreme compression regimes ($\le$ 4 bits per weight) using three novel techniques. First, QuIP# improves the incoherence processing from QuIP by using the randomized Hadamard transform, which is faster and has better theoretical properties. Second, QuIP# uses vector quantization techniques to take advantage of the ball-shaped sub-Gaussian distribution that incoherent weights possess: specifically, we introduce a set of hardware-efficient codebooks based on the highly symmetric $E_8$ lattice, which achieves the optimal 8-dimension unit ball packing. Third, QuIP# uses fine-tuning to improve fidelity to the original model. Our experiments show that QuIP# outperforms existing PTQ methods, enables new behaviors in PTQ scaling, and supports fast inference.
- [986] arXiv:2402.04398 (cross-list from cs.LG) [ pdf , ps , html , other ]
-
Title: Learning from Time Series under Temporal Label NoiseSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
Abstract: Many sequential classification tasks are affected by label noise that varies over time. Such noise can cause label quality to improve, worsen, or periodically change over time. We first propose and formalize temporal label noise, an unstudied problem for sequential classification of time series. In this setting, multiple labels are recorded in sequence while being corrupted by a time-dependent noise function. We first demonstrate the importance of modelling the temporal nature of the label noise function and how existing methods will consistently underperform. We then propose methods that can train noise-tolerant classifiers by estimating the temporal label noise function directly from data. We show that our methods lead to state-of-the-art performance in the presence of diverse temporal label noise functions using real and synthetic data.
- [987] arXiv:2402.04400 (cross-list from cs.LG) [ pdf , ps , html , other ]
-
Title: CEHR-GPT: Generating Electronic Health Records with Chronological Patient TimelinesChao Pang , Xinzhuo Jiang , Nishanth Parameshwar Pavinkurve , Krishna S. Kalluri , Elise L. Minto , Jason Patterson , Linying Zhang , George Hripcsak , Gamze Gürsoy , Noémie Elhadad , Karthik NatarajanSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Computers and Society (cs.CY)
Abstract: Synthetic Electronic Health Records (EHR) have emerged as a pivotal tool in advancing healthcare applications and machine learning models, particularly for researchers without direct access to healthcare data. Although existing methods, like rule-based approaches and generative adversarial networks (GANs), generate synthetic data that resembles real-world EHR data, these methods often use a tabular format, disregarding temporal dependencies in patient histories and limiting data replication. Recently, there has been a growing interest in leveraging Generative Pre-trained Transformers (GPT) for EHR data. This enables applications like disease progression analysis, population estimation, counterfactual reasoning, and synthetic data generation. In this work, we focus on synthetic data generation and demonstrate the capability of training a GPT model using a particular patient representation derived from CEHR-BERT, enabling us to generate patient sequences that can be seamlessly converted to the Observational Medical Outcomes Partnership (OMOP) data format.
- [988] arXiv:2402.04409 (cross-list from cs.LG) [ pdf , ps , html , other ]
-
Title: Towards Fair, Robust and Efficient Client Contribution Evaluation in Federated LearningSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Cryptography and Security (cs.CR); Distributed, Parallel, and Cluster Computing (cs.DC)
Abstract: The performance of clients in Federated Learning (FL) can vary due to various reasons. Assessing the contributions of each client is crucial for client selection and compensation. It is challenging because clients often have non-independent and identically distributed (non-iid) data, leading to potentially noisy or divergent updates. The risk of malicious clients amplifies the challenge especially when there's no access to clients' local data or a benchmark root dataset. In this paper, we introduce a novel method called Fair, Robust, and Efficient Client Assessment (FRECA) for quantifying client contributions in FL. FRECA employs a framework called FedTruth to estimate the global model's ground truth update, balancing contributions from all clients while filtering out impacts from malicious ones. This approach is robust against Byzantine attacks and incorporates a Byzantine-resilient aggregation algorithm. FRECA is also efficient, as it operates solely on local model updates and requires no validation operations or datasets. Our experimental results show that FRECA can accurately and efficiently quantify client contributions in a robust manner.
- [989] arXiv:2402.04412 (cross-list from cs.LG) [ pdf , ps , html , other ]
-
Title: The VampPrior Mixture ModelSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
Abstract: Current clustering priors for deep latent variable models (DLVMs) require defining the number of clusters a-priori and are susceptible to poor initializations. Addressing these deficiencies could greatly benefit deep learning-based scRNA-seq analysis by performing integration and clustering simultaneously. We adapt the VampPrior (Tomczak & Welling, 2018) into a Dirichlet process Gaussian mixture model, resulting in the VampPrior Mixture Model (VMM), a novel prior for DLVMs. We propose an inference procedure that alternates between variational inference and Empirical Bayes to cleanly distinguish variational and prior parameters. Using the VMM in a Variational Autoencoder attains highly competitive clustering performance on benchmark datasets. Augmenting scVI (Lopez et al., 2018), a popular scRNA-seq integration method, with the VMM significantly improves its performance and automatically arranges cells into biologically meaningful clusters.
- [990] arXiv:2402.04420 (cross-list from cs.CY) [ pdf , ps , other ]
-
Title: Measuring machine learning harms from stereotypes: requires understanding who is being harmed by which errors in what waysComments: earlier draft non-archival at EAAMO 2023Subjects: Computers and Society (cs.CY) ; Artificial Intelligence (cs.AI)
Abstract: As machine learning applications proliferate, we need an understanding of their potential for harm. However, current fairness metrics are rarely grounded in human psychological experiences of harm. Drawing on the social psychology of stereotypes, we use a case study of gender stereotypes in image search to examine how people react to machine learning errors. First, we use survey studies to show that not all machine learning errors reflect stereotypes nor are equally harmful. Then, in experimental studies we randomly expose participants to stereotype-reinforcing, -violating, and -neutral machine learning errors. We find stereotype-reinforcing errors induce more experientially (i.e., subjectively) harmful experiences, while having minimal changes to cognitive beliefs, attitudes, or behaviors. This experiential harm impacts women more than men. However, certain stereotype-violating errors are more experientially harmful for men, potentially due to perceived threats to masculinity. We conclude that harm cannot be the sole guide in fairness mitigation, and propose a nuanced perspective depending on who is experiencing what harm and why.
- [991] arXiv:2402.04421 (cross-list from cs.SE) [ pdf , ps , html , other ]
-
Title: Studying Vulnerable Code Entities in RComments: 5 pages, 3 figures, and 2 tables. to be published in ICPC 2024Subjects: Software Engineering (cs.SE) ; Artificial Intelligence (cs.AI)
Abstract: Pre-trained Code Language Models (Code-PLMs) have shown many advancements and achieved state-of-the-art results for many software engineering tasks in the past few years. These models are mainly targeted for popular programming languages such as Java and Python, leaving out many other ones like R. Though R has a wide community of developers and users, there is little known about the applicability of Code-PLMs for R. In this preliminary study, we aim to investigate the vulnerability of Code-PLMs for code entities in R. For this purpose, we use an R dataset of code and comment pairs and then apply CodeAttack, a black-box attack model that uses the structure of code to generate adversarial code samples. We investigate how the model can attack different entities in R. This is the first step towards understanding the importance of R token types, compared to popular programming languages (e.g., Java). We limit our study to code summarization. Our results show that the most vulnerable code entity is the identifier, followed by some syntax tokens specific to R. The results can shed light on the importance of token types and help in developing models for code summarization and method name prediction for the R language.
- [992] arXiv:2402.04435 (cross-list from cs.LG) [ pdf , ps , html , other ]
-
Title: PreGIP: Watermarking the Pretraining of Graph Neural Networks for Deep Intellectual Property ProtectionSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Abstract: Pretraining on Graph Neural Networks (GNNs) has shown great power in facilitating various downstream tasks. As pretraining generally requires huge amount of data and computational resources, the pretrained GNNs are high-value Intellectual Properties (IP) of the legitimate owner. However, adversaries may illegally copy and deploy the pretrained GNN models for their downstream tasks. Though initial efforts have been made to watermark GNN classifiers for IP protection, these methods require the target classification task for watermarking, and thus are not applicable to self-supervised pretraining of GNN models. Hence, in this work, we propose a novel framework named PreGIP to watermark the pretraining of GNN encoder for IP protection while maintain the high-quality of the embedding space. PreGIP incorporates a task-free watermarking loss to watermark the embedding space of pretrained GNN encoder. A finetuning-resistant watermark injection is further deployed. Theoretical analysis and extensive experiments show the effectiveness of {\method} in IP protection and maintaining high-performance for downstream tasks.
- [993] arXiv:2402.04466 (cross-list from cs.SE) [ pdf , ps , html , other ]
-
Title: Towards Deterministic End-to-end Latency for Medical AI Systems in NVIDIA HoloscanSubjects: Software Engineering (cs.SE) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Operating Systems (cs.OS)
Abstract: The introduction of AI and ML technologies into medical devices has revolutionized healthcare diagnostics and treatments. Medical device manufacturers are keen to maximize the advantages afforded by AI and ML by consolidating multiple applications onto a single platform. However, concurrent execution of several AI applications, each with its own visualization components, leads to unpredictable end-to-end latency, primarily due to GPU resource contentions. To mitigate this, manufacturers typically deploy separate workstations for distinct AI applications, thereby increasing financial, energy, and maintenance costs. This paper addresses these challenges within the context of NVIDIA's Holoscan platform, a real-time AI system for streaming sensor data and images. We propose a system design optimized for heterogeneous GPU workloads, encompassing both compute and graphics tasks. Our design leverages CUDA MPS for spatial partitioning of compute workloads and isolates compute and graphics processing onto separate GPUs. We demonstrate significant performance improvements across various end-to-end latency determinism metrics through empirical evaluation with real-world Holoscan medical device applications. For instance, the proposed design reduces maximum latency by 21-30% and improves latency distribution flatness by 17-25% for up to five concurrent endoscopy tool tracking AI applications, compared to a single-GPU baseline. Against a default multi-GPU setup, our optimizations decrease maximum latency by 35% for up to six concurrent applications by improving GPU utilization by 42%. This paper provides clear design insights for AI applications in the edge-computing domain including medical systems, where performance predictability of concurrent and heterogeneous GPU workloads is a critical requirement.
- [994] arXiv:2402.04476 (cross-list from cs.CV) [ pdf , ps , html , other ]
-
Title: Dual-View Visual Contextualization for Web NavigationComments: Accepted to CVPR 2024Subjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
Abstract: Automatic web navigation aims to build a web agent that can follow language instructions to execute complex and diverse tasks on real-world websites. Existing work primarily takes HTML documents as input, which define the contents and action spaces (i.e., actionable elements and operations) of webpages. Nevertheless, HTML documents may not provide a clear task-related context for each element, making it hard to select the right (sequence of) actions. In this paper, we propose to contextualize HTML elements through their "dual views" in webpage screenshots: each HTML element has its corresponding bounding box and visual content in the screenshot. We build upon the insight -- web developers tend to arrange task-related elements nearby on webpages to enhance user experiences -- and propose to contextualize each element with its neighbor elements, using both textual and visual features. The resulting representations of HTML elements are more informative for the agent to take action. We validate our method on the recently released Mind2Web dataset, which features diverse navigation domains and tasks on real-world websites. Our method consistently outperforms the baseline in all the scenarios, including cross-task, cross-website, and cross-domain ones.
- [995] arXiv:2402.04477 (cross-list from cs.CL) [ pdf , ps , html , other ]
-
Title: Detecting Mode Collapse in Language Models via NarrationComments: To appear in the proceedings of the first Workshop on the Scaling Behavior of Large Language Models (EACL 2024)Subjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI)
Abstract: No two authors write alike. Personal flourishes invoked in written narratives, from lexicon to rhetorical devices, imply a particular author--what literary theorists label the implied or virtual author; distinct from the real author or narrator of a text. Early large language models trained on unfiltered training sets drawn from a variety of discordant sources yielded incoherent personalities, problematic for conversational tasks but proving useful for sampling literature from multiple perspectives. Successes in alignment research in recent years have allowed researchers to impose subjectively consistent personae on language models via instruction tuning and reinforcement learning from human feedback (RLHF), but whether aligned models retain the ability to model an arbitrary virtual author has received little scrutiny. By studying 4,374 stories sampled from three OpenAI language models, we show successive versions of GPT-3 suffer from increasing degrees of "mode collapse" whereby overfitting the model during alignment constrains it from generalizing over authorship: models suffering from mode collapse become unable to assume a multiplicity of perspectives. Our method and results are significant for researchers seeking to employ language models in sociological simulations.
- [996] arXiv:2402.04494 (cross-list from cs.LG) [ pdf , ps , html , other ]
-
Title: Grandmaster-Level Chess Without SearchAnian Ruoss , Grégoire Delétang , Sourabh Medapati , Jordi Grau-Moya , Li Kevin Wenliang , Elliot Catt , John Reid , Tim GeneweinSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
Abstract: The recent breakthrough successes in machine learning are mainly attributed to scale: namely large-scale attention-based architectures and datasets of unprecedented scale. This paper investigates the impact of training at scale for chess. Unlike traditional chess engines that rely on complex heuristics, explicit search, or a combination of both, we train a 270M parameter transformer model with supervised learning on a dataset of 10 million chess games. We annotate each board in the dataset with action-values provided by the powerful Stockfish 16 engine, leading to roughly 15 billion data points. Our largest model reaches a Lichess blitz Elo of 2895 against humans, and successfully solves a series of challenging chess puzzles, without any domain-specific tweaks or explicit search algorithms. We also show that our model outperforms AlphaZero's policy and value networks (without MCTS) and GPT-3.5-turbo-instruct. A systematic investigation of model and dataset size shows that strong chess performance only arises at sufficient scale. To validate our results, we perform an extensive series of ablations of design choices and hyperparameters.
- [997] arXiv:2402.04515 (cross-list from cs.NI) [ pdf , ps , html , other ]
-
Title: A Deep Reinforcement Learning Approach for Adaptive Traffic Routing in Next-gen NetworksComments: Accepted for publication in the Proceedings of the IEEE International Conference on Communications (IEEE ICC 2024)Subjects: Networking and Internet Architecture (cs.NI) ; Artificial Intelligence (cs.AI)
Abstract: Next-gen networks require significant evolution of management to enable automation and adaptively adjust network configuration based on traffic dynamics. The advent of software-defined networking (SDN) and programmable switches enables flexibility and programmability. However, traditional techniques that decide traffic policies are usually based on hand-crafted programming optimization and heuristic algorithms. These techniques make non-realistic assumptions, e.g., considering static network load and topology, to obtain tractable solutions, which are inadequate for next-gen networks. In this paper, we design and develop a deep reinforcement learning (DRL) approach for adaptive traffic routing. We design a deep graph convolutional neural network (DGCNN) integrated into the DRL framework to learn the traffic behavior from not only the network topology but also link and node attributes. We adopt the Deep Q-Learning technique to train the DGCNN model in the DRL framework without the need for a labeled training dataset, enabling the framework to quickly adapt to traffic dynamics. The model leverages q-value estimates to select the routing path for every traffic flow request, balancing exploration and exploitation. We perform extensive experiments with various traffic patterns and compare the performance of the proposed approach with the Open Shortest Path First (OSPF) protocol. The experimental results show the effectiveness and adaptiveness of the proposed framework by increasing the network throughput by up to 7.8% and reducing the traffic delay by up to 16.1% compared to OSPF.
- [998] arXiv:2402.04520 (cross-list from cs.LG) [ pdf , ps , html , other ]
-
Title: On Computational Limits of Modern Hopfield Models: A Fine-Grained Complexity AnalysisComments: 31 pages; v2: fix typos; v3: fix typos, add clarifications, add referencesSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
Abstract: We investigate the computational limits of the memory retrieval dynamics of modern Hopfield models from the fine-grained complexity analysis. Our key contribution is the characterization of a phase transition behavior in the efficiency of all possible modern Hopfield models based on the norm of patterns. Specifically, we establish an upper bound criterion for the norm of input query patterns and memory patterns. Only below this criterion, sub-quadratic (efficient) variants of the modern Hopfield model exist, assuming the Strong Exponential Time Hypothesis (SETH). To showcase our theory, we provide a formal example of efficient constructions of modern Hopfield models using low-rank approximation when the efficient criterion holds. This includes a derivation of a lower bound on the computational time, scaling linearly with $\max\{$\# of stored memory patterns, length of input query sequence$\}$. In addition, we prove its memory retrieval error bound and exponential memory capacity.
- [999] arXiv:2402.04527 (cross-list from cs.IR) [ pdf , ps , html , other ]
-
Title: RA-Rec: An Efficient ID Representation Alignment Framework for LLM-based RecommendationComments: 10 pagesSubjects: Information Retrieval (cs.IR) ; Artificial Intelligence (cs.AI)
Abstract: Large language models (LLM) have recently emerged as a powerful tool for a variety of natural language processing tasks, bringing a new surge of combining LLM with recommendation systems, termed as LLM-based RS. Current approaches generally fall into two main paradigms, the ID direct usage paradigm and the ID translation paradigm, noting their core weakness stems from lacking recommendation knowledge and uniqueness. To address this limitation, we propose a new paradigm, ID representation, which incorporates pre-trained ID embeddings into LLMs in a complementary manner. In this work, we present RA-Rec, an efficient ID representation alignment framework for LLM-based recommendation, which is compatible with multiple ID-based methods and LLM architectures. Specifically, we treat ID embeddings as soft prompts and design an innovative alignment module and an efficient tuning method with tailored data construction for alignment. Extensive experiments demonstrate RA-Rec substantially outperforms current state-of-the-art methods, achieving up to 3.0% absolute HitRate@100 improvements while utilizing less than 10x training data.
- [1000] arXiv:2402.04536 (cross-list from cs.RO) [ pdf , ps , html , other ]
-
Title: Tactile-based Object Retrieval From Granular MediaJingxi Xu , Yinsen Jia , Dongxiao Yang , Patrick Meng , Xinyue Zhu , Zihan Guo , Shuran Song , Matei CiocarlieSubjects: Robotics (cs.RO) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: We introduce GEOTACT, a robotic manipulation method capable of retrieving objects buried in granular media. This is a challenging task due to the need to interact with granular media, and doing so based exclusively on tactile feedback, since a buried object can be completely hidden from vision. Tactile feedback is in itself challenging in this context, due to ubiquitous contact with the surrounding media, and the inherent noise level induced by the tactile readings. To address these challenges, we use a learning method trained end-to-end with simulated sensor noise. We show that our problem formulation leads to the natural emergence of learned pushing behaviors that the manipulator uses to reduce uncertainty and funnel the object to a stable grasp despite spurious and noisy tactile readings. We also introduce a training curriculum that enables learning these behaviors in simulation, followed by zero-shot transfer to real hardware. To the best of our knowledge, GEOTACT is the first method to reliably retrieve a number of different objects from a granular environment, doing so on real hardware and with integrated tactile sensing. Videos and additional information can be found at this https URL .
- [1001] arXiv:2402.04539 (cross-list from cs.LG) [ pdf , ps , other ]
-
Title: Learning Diverse Policies with Soft Self-Generated GuidanceComments: 23 pages, 19 figuresJournal-ref: International Journal of Intelligent Systems, Volume 2023Subjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Abstract: Reinforcement learning (RL) with sparse and deceptive rewards is challenging because non-zero rewards are rarely obtained. Hence, the gradient calculated by the agent can be stochastic and without valid information. Recent studies that utilize memory buffers of previous experiences can lead to a more efficient learning process. However, existing methods often require these experiences to be successful and may overly exploit them, which can cause the agent to adopt suboptimal behaviors. This paper develops an approach that uses diverse past trajectories for faster and more efficient online RL, even if these trajectories are suboptimal or not highly rewarded. The proposed algorithm combines a policy improvement step with an additional exploration step using offline demonstration data. The main contribution of this paper is that by regarding diverse past trajectories as guidance, instead of imitating them, our method directs its policy to follow and expand past trajectories while still being able to learn without rewards and approach optimality. Furthermore, a novel diversity measurement is introduced to maintain the team's diversity and regulate exploration. The proposed algorithm is evaluated on discrete and continuous control tasks with sparse and deceptive rewards. Compared with the existing RL methods, the experimental results indicate that our proposed algorithm is significantly better than the baseline methods regarding diverse exploration and avoiding local optima.
- [1002] arXiv:2402.04563 (cross-list from cs.CV) [ pdf , ps , other ]
-
Title: Attention Guided CAM: Visual Explanations of Vision Transformer Guided by Self-AttentionComments: AAAI2024. Code available at this https URLSubjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI)
Abstract: Vision Transformer(ViT) is one of the most widely used models in the computer vision field with its great performance on various tasks. In order to fully utilize the ViT-based architecture in various applications, proper visualization methods with a decent localization performance are necessary, but these methods employed in CNN-based models are still not available in ViT due to its unique structure. In this work, we propose an attention-guided visualization method applied to ViT that provides a high-level semantic explanation for its decision. Our method selectively aggregates the gradients directly propagated from the classification output to each self-attention, collecting the contribution of image features extracted from each location of the input image. These gradients are additionally guided by the normalized self-attention scores, which are the pairwise patch correlation scores. They are used to supplement the gradients on the patch-level context information efficiently detected by the self-attention mechanism. This approach of our method provides elaborate high-level semantic explanations with great localization performance only with the class labels. As a result, our method outperforms the previous leading explainability methods of ViT in the weakly-supervised localization task and presents great capability in capturing the full instances of the target class object. Meanwhile, our method provides a visualization that faithfully explains the model, which is demonstrated in the perturbation comparison test.
- [1003] arXiv:2402.04567 (cross-list from cs.LG) [ pdf , ps , other ]
-
Title: OIL-AD: An Anomaly Detection Framework for Sequential Decision SequencesSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Abstract: Anomaly detection in decision-making sequences is a challenging problem due to the complexity of normality representation learning and the sequential nature of the task. Most existing methods based on Reinforcement Learning (RL) are difficult to implement in the real world due to unrealistic assumptions, such as having access to environment dynamics, reward signals, and online interactions with the environment. To address these limitations, we propose an unsupervised method named Offline Imitation Learning based Anomaly Detection (OIL-AD), which detects anomalies in decision-making sequences using two extracted behaviour features: action optimality and sequential association. Our offline learning model is an adaptation of behavioural cloning with a transformer policy network, where we modify the training process to learn a Q function and a state value function from normal trajectories. We propose that the Q function and the state value function can provide sufficient information about agents' behavioural data, from which we derive two features for anomaly detection. The intuition behind our method is that the action optimality feature derived from the Q function can differentiate the optimal action from others at each local state, and the sequential association feature derived from the state value function has the potential to maintain the temporal correlations between decisions (state-action pairs). Our experiments show that OIL-AD can achieve outstanding online anomaly detection performance with up to 34.8% improvement in F1 score over comparable baselines.
- [1004] arXiv:2402.04580 (cross-list from cs.RO) [ pdf , ps , other ]
-
Title: A Comprehensive Survey of Cross-Domain Policy Transfer for Embodied AgentsSubjects: Robotics (cs.RO) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: The burgeoning fields of robot learning and embodied AI have triggered an increasing demand for large quantities of data. However, collecting sufficient unbiased data from the target domain remains a challenge due to costly data collection processes and stringent safety requirements. Consequently, researchers often resort to data from easily accessible source domains, such as simulation and laboratory environments, for cost-effective data acquisition and rapid model iteration. Nevertheless, the environments and embodiments of these source domains can be quite different from their target domain counterparts, underscoring the need for effective cross-domain policy transfer approaches. In this paper, we conduct a systematic review of existing cross-domain policy transfer methods. Through a nuanced categorization of domain gaps, we encapsulate the overarching insights and design considerations of each problem setting. We also provide a high-level discussion about the key methodologies used in cross-domain policy transfer problems. Lastly, we summarize the open challenges that lie beyond the capabilities of current paradigms and discuss potential future directions in this field.
- [1005] arXiv:2402.04596 (cross-list from cs.LG) [ pdf , ps , other ]
-
Title: Towards Improved Imbalance Robustness in Continual Multi-Label Learning with Dual Output Spiking Architecture (DOSA)Comments: 8 pages, 4 figures, 4 tables, 45 references. Submitted to IJCNN 2024Subjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
Abstract: Algorithms designed for addressing typical supervised classification problems can only learn from a fixed set of samples and labels, making them unsuitable for the real world, where data arrives as a stream of samples often associated with multiple labels over time. This motivates the study of task-agnostic continual multi-label learning problems. While algorithms using deep learning approaches for continual multi-label learning have been proposed in the recent literature, they tend to be computationally heavy. Although spiking neural networks (SNNs) offer a computationally efficient alternative to artificial neural networks, existing literature has not used SNNs for continual multi-label learning. Also, accurately determining multiple labels with SNNs is still an open research problem. This work proposes a dual output spiking architecture (DOSA) to bridge these research gaps. A novel imbalance-aware loss function is also proposed, improving the multi-label classification performance of the model by making it more robust to data imbalance. A modified F1 score is presented to evaluate the effectiveness of the proposed loss function in handling imbalance. Experiments on several benchmark multi-label datasets show that DOSA trained with the proposed loss function shows improved robustness to data imbalance and obtains better continual multi-label learning performance than CIFDM, a previous state-of-the-art algorithm.
- [1006] arXiv:2402.04599 (cross-list from cs.CV) [ pdf , ps , html , other ]
-
Title: Meet JEANIE: a Similarity Measure for 3D Skeleton Sequences via Temporal-Viewpoint AlignmentComments: Accepted by the International Journal of Computer Vision (IJCV). An extension of our ACCV'22 paper [arXiv: arXiv:2210.16820 ] which was distinguished by the Sang Uk Lee Best Student Paper AwardSubjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: Video sequences exhibit significant nuisance variations (undesired effects) of speed of actions, temporal locations, and subjects' poses, leading to temporal-viewpoint misalignment when comparing two sets of frames or evaluating the similarity of two sequences. Thus, we propose Joint tEmporal and cAmera viewpoiNt alIgnmEnt (JEANIE) for sequence pairs. In particular, we focus on 3D skeleton sequences whose camera and subjects' poses can be easily manipulated in 3D. We evaluate JEANIE on skeletal Few-shot Action Recognition (FSAR), where matching well temporal blocks (temporal chunks that make up a sequence) of support-query sequence pairs (by factoring out nuisance variations) is essential due to limited samples of novel classes. Given a query sequence, we create its several views by simulating several camera locations. For a support sequence, we match it with view-simulated query sequences, as in the popular Dynamic Time Warping (DTW). Specifically, each support temporal block can be matched to the query temporal block with the same or adjacent (next) temporal index, and adjacent camera views to achieve joint local temporal-viewpoint warping. JEANIE selects the smallest distance among matching paths with different temporal-viewpoint warping patterns, an advantage over DTW which only performs temporal alignment. We also propose an unsupervised FSAR akin to clustering of sequences with JEANIE as a distance measure. JEANIE achieves state-of-the-art results on NTU-60, NTU-120, Kinetics-skeleton and UWA3D Multiview Activity II on supervised and unsupervised FSAR, and their meta-learning inspired fusion.
- [1007] arXiv:2402.04601 (cross-list from cs.CL) [ pdf , ps , other ]
-
Title: Alirector: Alignment-Enhanced Chinese Grammatical Error CorrectorSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI)
Abstract: Chinese grammatical error correction (CGEC) faces serious overcorrection challenges when employing autoregressive generative models such as sequence-to-sequence (Seq2Seq) models and decoder-only large language models (LLMs). While previous methods aim to address overcorrection in Seq2Seq models, they are difficult to adapt to decoder-only LLMs. In this paper, we propose an alignment-enhanced corrector for the overcorrection problem that applies to both Seq2Seq models and decoder-only LLMs. Our method first trains a correction model to generate an initial correction of the source sentence. Then, we combine the source sentence with the initial correction and feed it through an alignment model for another round of correction, aiming to enforce the alignment model to focus on potential overcorrection. Moreover, to enhance the model's ability to identify nuances, we further explore the reverse alignment of the source sentence and the initial correction. Finally, we transfer the alignment knowledge from two alignment models to the correction model, instructing it on how to avoid overcorrection. Experimental results on three CGEC datasets demonstrate the effectiveness of our approach in alleviating overcorrection and improving overall performance.
- [1008] arXiv:2402.04615 (cross-list from cs.CV) [ pdf , ps , other ]
-
Title: ScreenAI: A Vision-Language Model for UI and Infographics UnderstandingGilles Baechler , Srinivas Sunkara , Maria Wang , Fedir Zubach , Hassan Mansoor , Vincent Etter , Victor Cărbune , Jason Lin , Jindong Chen , Abhanshu SharmaComments: Revision notes: 1) In Appendix I, added dataset location for ScreenQA Short in Appendix I. 2) In Table 4, updated evaluation numbers for Screen Annotation and Complex Screen QA benchmarks as the datasets are updated. 3) Updated Figure 4 to reflect the changes in evaluation numbers described in 2). 4) Minor revisions in other placesSubjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI)
Abstract: Screen user interfaces (UIs) and infographics, sharing similar visual language and design principles, play important roles in human communication and human-machine interaction. We introduce ScreenAI, a vision-language model that specializes in UI and infographics understanding. Our model improves upon the PaLI architecture with the flexible patching strategy of pix2struct and is trained on a unique mixture of datasets. At the heart of this mixture is a novel screen annotation task in which the model has to identify the type and location of UI elements. We use these text annotations to describe screens to Large Language Models and automatically generate question-answering (QA), UI navigation, and summarization training datasets at scale. We run ablation studies to demonstrate the impact of these design choices. At only 5B parameters, ScreenAI achieves new state-of-the-artresults on UI- and infographics-based tasks (Multi-page DocVQA, WebSRC, MoTIF and Widget Captioning), and new best-in-class performance on others (Chart QA, DocVQA, and InfographicVQA) compared to models of similar size. Finally, we release three new datasets: one focused on the screen annotation task and two others focused on question answering.
- [1009] arXiv:2402.04616 (cross-list from cs.CL) [ pdf , ps , html , other ]
-
Title: TinyLLM: Learning a Small Student from Multiple Large Language ModelsSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: Transferring the reasoning capability from stronger large language models (LLMs) to smaller ones has been quite appealing, as smaller LLMs are more flexible to deploy with less expense. Among the existing solutions, knowledge distillation stands out due to its outstanding efficiency and generalization. However, existing methods suffer from several drawbacks, including limited knowledge diversity and the lack of rich contextual information. To solve the problems and facilitate the learning of compact language models, we propose TinyLLM, a new knowledge distillation paradigm to learn a small student LLM from multiple large teacher LLMs. In particular, we encourage the student LLM to not only generate the correct answers but also understand the rationales behind these answers. Given that different LLMs possess diverse reasoning skills, we guide the student model to assimilate knowledge from various teacher LLMs. We further introduce an in-context example generator and a teacher-forcing Chain-of-Thought strategy to ensure that the rationales are accurate and grounded in contextually appropriate scenarios. Extensive experiments on six datasets across two reasoning tasks demonstrate the superiority of our method. Results show that TinyLLM can outperform large teacher LLMs significantly, despite a considerably smaller model size.
- [1010] arXiv:2402.04617 (cross-list from cs.CL) [ pdf , ps , other ]
-
Title: InfLLM: Unveiling the Intrinsic Capacity of LLMs for Understanding Extremely Long Sequences with Training-Free MemoryChaojun Xiao , Pengle Zhang , Xu Han , Guangxuan Xiao , Yankai Lin , Zhengyan Zhang , Zhiyuan Liu , Song Han , Maosong SunSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: Large language models (LLMs) have emerged as a cornerstone in real-world applications with lengthy streaming inputs, such as LLM-driven agents. However, existing LLMs, pre-trained on sequences with restricted maximum length, cannot generalize to longer sequences due to the out-of-domain and distraction issues. To alleviate these issues, existing efforts employ sliding attention windows and discard distant tokens to achieve the processing of extremely long sequences. Unfortunately, these approaches inevitably fail to capture long-distance dependencies within sequences to deeply understand semantics. This paper introduces a training-free memory-based method, InfLLM, to unveil the intrinsic ability of LLMs to process streaming long sequences. Specifically, InfLLM stores distant contexts into additional memory units and employs an efficient mechanism to lookup token-relevant units for attention computation. Thereby, InfLLM allows LLMs to efficiently process long sequences while maintaining the ability to capture long-distance dependencies. Without any training, InfLLM enables LLMs pre-trained on sequences of a few thousand tokens to achieve superior performance than competitive baselines continually training these LLMs on long sequences. Even when the sequence length is scaled to $1,024$K, InfLLM still effectively captures long-distance dependencies.
- [1011] arXiv:2402.04644 (cross-list from cs.LG) [ pdf , ps , other ]
-
Title: LEVI: Generalizable Fine-tuning via Layer-wise Ensemble of Different ViewsYuji Roh , Qingyun Liu , Huan Gui , Zhe Yuan , Yujin Tang , Steven Euijong Whang , Liang Liu , Shuchao Bi , Lichan Hong , Ed H. Chi , Zhe ZhaoSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Abstract: Fine-tuning is becoming widely used for leveraging the power of pre-trained foundation models in new downstream tasks. While there are many successes of fine-tuning on various tasks, recent studies have observed challenges in the generalization of fine-tuned models to unseen distributions (i.e., out-of-distribution; OOD). To improve OOD generalization, some previous studies identify the limitations of fine-tuning data and regulate fine-tuning to preserve the general representation learned from pre-training data. However, potential limitations in the pre-training data and models are often ignored. In this paper, we contend that overly relying on the pre-trained representation may hinder fine-tuning from learning essential representations for downstream tasks and thus hurt its OOD generalization. It can be especially catastrophic when new tasks are from different (sub)domains compared to pre-training data. To address the issues in both pre-training and fine-tuning data, we propose a novel generalizable fine-tuning method LEVI, where the pre-trained model is adaptively ensembled layer-wise with a small task-specific model, while preserving training and inference efficiencies. By combining two complementing models, LEVI effectively suppresses problematic features in both the fine-tuning data and pre-trained model and preserves useful features for new tasks. Broad experiments with large language and vision models show that LEVI greatly improves fine-tuning generalization via emphasizing different views from fine-tuning data and pre-trained features.
- [1012] arXiv:2402.04660 (cross-list from cs.CR) [ pdf , ps , other ]
-
Title: Adversarial Robustness Through Artifact DesignSubjects: Cryptography and Security (cs.CR) ; Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Abstract: Adversarial examples arose as a challenge for machine learning. To hinder them, most defenses alter how models are trained (e.g., adversarial training) or inference is made (e.g., randomized smoothing). Still, while these approaches markedly improve models' adversarial robustness, models remain highly susceptible to adversarial examples. Identifying that, in certain domains such as traffic-sign recognition, objects are implemented per standards specifying how artifacts (e.g., signs) should be designed, we propose a novel approach for improving adversarial robustness. Specifically, we offer a method to redefine standards, making minor changes to existing ones, to defend against adversarial examples. We formulate the problem of artifact design as a robust optimization problem, and propose gradient-based and greedy search methods to solve it. We evaluated our approach in the domain of traffic-sign recognition, allowing it to alter traffic-sign pictograms (i.e., symbols within the signs) and their colors. We found that, combined with adversarial training, our approach led to up to 25.18\% higher robust accuracy compared to state-of-the-art methods against two adversary types, while further increasing accuracy on benign inputs.
- [1013] arXiv:2402.04676 (cross-list from cs.LG) [ pdf , ps , html , other ]
-
Title: Group Distributionally Robust Dataset Distillation with Risk MinimizationSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
Abstract: Dataset distillation (DD) has emerged as a widely adopted technique for crafting a synthetic dataset that captures the essential information of a training dataset, facilitating the training of accurate neural models. Its applications span various domains, including transfer learning, federated learning, and neural architecture search. The most popular methods for constructing the synthetic data rely on matching the convergence properties of training the model with the synthetic dataset and the training dataset. However, targeting the training dataset must be thought of as auxiliary in the same sense that the training set is an approximate substitute for the population distribution, and the latter is the data of interest. Yet despite its popularity, an aspect that remains unexplored is the relationship of DD to its generalization, particularly across uncommon subgroups. That is, how can we ensure that a model trained on the synthetic dataset performs well when faced with samples from regions with low population density? Here, the representativeness and coverage of the dataset become salient over the guaranteed training error at inference. Drawing inspiration from distributionally robust optimization, we introduce an algorithm that combines clustering with the minimization of a risk measure on the loss to conduct DD. We provide a theoretical rationale for our approach and demonstrate its effective generalization and robustness across subgroups through numerical experiments. The source code is available in this https URL .
- [1014] arXiv:2402.04678 (cross-list from cs.CL) [ pdf , ps , other ]
-
Title: Large Language Models As Faithful ExplainersYu-Neng Chuang , Guanchu Wang , Chia-Yuan Chang , Ruixiang Tang , Fan Yang , Mengnan Du , Xuanting Cai , Xia HuSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: Large Language Models (LLMs) have recently become proficient in addressing complex tasks by utilizing their rich internal knowledge and reasoning ability. Consequently, this complexity hinders traditional input-focused explanation algorithms for explaining the complex decision-making processes of LLMs. Recent advancements have thus emerged for self-explaining their predictions through a single feed-forward inference in a natural language format. However, natural language explanations are often criticized for lack of faithfulness since these explanations may not accurately reflect the decision-making behaviors of the LLMs. In this work, we introduce a generative explanation framework, xLLM, to improve the faithfulness of the explanations provided in natural language formats for LLMs. Specifically, we propose an evaluator to quantify the faithfulness of natural language explanation and enhance the faithfulness by an iterative optimization process of xLLM, with the goal of maximizing the faithfulness scores. Experiments conducted on three NLU datasets demonstrate that xLLM can significantly improve the faithfulness of generated explanations, which are in alignment with the behaviors of LLMs.
- [1015] arXiv:2402.04699 (cross-list from cs.CV) [ pdf , ps , html , other ]
-
Title: EvoSeed: Unveiling the Threat on Deep Neural Networks with Real-World IllusionsSubjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)
Abstract: Deep neural networks are exploited using natural adversarial samples, which have no impact on human perception but are misclassified. Current approaches often rely on the white-box nature of deep neural networks to generate these adversarial samples or alter the distribution of adversarial samples compared to training distribution. To alleviate the limitations of current approaches, we propose EvoSeed, a novel evolutionary strategy-based search algorithmic framework to generate natural adversarial samples. Our EvoSeed framework uses auxiliary Diffusion and Classifier models to operate in a model-agnostic black-box setting. We employ CMA-ES to optimize the search for an adversarial seed vector, which, when processed by the Conditional Diffusion Model, results in an unrestricted natural adversarial sample misclassified by the Classifier Model. Experiments show that generated adversarial images are of high image quality and are transferable to different classifiers. Our approach demonstrates promise in enhancing the quality of adversarial samples using evolutionary algorithms. We hope our research opens new avenues to enhance the robustness of deep neural networks in real-world scenarios. Project Website can be accessed at \url{ this https URL }.
- [1016] arXiv:2402.04763 (cross-list from cs.RO) [ pdf , ps , other ]
-
Title: Emergence of specialized Collective Behaviors in Evolving Heterogeneous SwarmsSubjects: Robotics (cs.RO) ; Artificial Intelligence (cs.AI)
Abstract: Natural groups of animals, such as swarms of social insects, exhibit astonishing degrees of task specialization, useful to address complex tasks and to survive. This is supported by phenotypic plasticity: individuals sharing the same genotype that is expressed differently for different classes of individuals, each specializing in one task. In this work, we evolve a swarm of simulated robots with phenotypic plasticity to study the emergence of specialized collective behavior during an emergent perception task. Phenotypic plasticity is realized in the form of heterogeneity of behavior by dividing the genotype into two components, with one different neural network controller associated to each component. The whole genotype, expressing the behavior of the whole group through the two components, is subject to evolution with a single fitness function. We analyse the obtained behaviors and use the insights provided by these results to design an online regulatory mechanism. Our experiments show three main findings: 1) The sub-groups evolve distinct emergent behaviors. 2) The effectiveness of the whole swarm depends on the interaction between the two sub-groups, leading to a more robust performance than with singular sub-group behavior. 3) The online regulatory mechanism enhances overall performance and scalability.
- [1017] arXiv:2402.04779 (cross-list from cs.CL) [ pdf , ps , other ]
-
Title: StableMask: Refining Causal Masking in Decoder-only TransformerComments: PreprintSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI)
Abstract: The decoder-only Transformer architecture with causal masking and relative position encoding (RPE) has become the de facto choice in language modeling. Despite its exceptional performance across various tasks, we have identified two limitations: First, it requires all attention scores to be non-zero and sum up to 1, even if the current embedding has sufficient self-contained information. This compels the model to assign disproportional excessive attention to specific tokens. Second, RPE-based Transformers are not universal approximators due to their limited capacity at encoding absolute positional information, which limits their application in position-critical tasks. In this work, we propose StableMask: a parameter-free method to address both limitations by refining the causal mask. It introduces pseudo-attention values to balance attention distributions and encodes absolute positional information via a progressively decreasing mask ratio. StableMask's effectiveness is validated both theoretically and empirically, showing significant enhancements in language models with parameter sizes ranging from 71M to 1.4B across diverse datasets and encoding methods. We further show that it naturally supports (1) efficient extrapolation without special tricks such as StreamingLLM and (2) easy integration with existing attention optimization techniques.
- [1018] arXiv:2402.04788 (cross-list from cs.CL) [ pdf , ps , other ]
-
Title: MLLM-as-a-Judge: Assessing Multimodal LLM-as-a-Judge with Vision-Language BenchmarkDongping Chen , Ruoxi Chen , Shilin Zhang , Yinuo Liu , Yaochen Wang , Huichi Zhou , Qihui Zhang , Pan Zhou , Yao Wan , Lichao SunSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
Abstract: Multimodal Large Language Models (MLLMs) have gained significant attention recently, showing remarkable potential in artificial general intelligence. However, assessing the utility of MLLMs presents considerable challenges, primarily due to the absence multimodal benchmarks that align with human preferences. Inspired by LLM-as-a-Judge in LLMs, this paper introduces a novel benchmark, termed MLLM-as-a-Judge, to assess the ability of MLLMs in assisting judges including three distinct tasks: Scoring Evaluation, Pair Comparison, and Batch Ranking. Our study reveals that, while MLLMs demonstrate remarkable human-like discernment in Pair Comparisons, there is a significant divergence from human preferences in Scoring Evaluation and Batch Ranking tasks. Furthermore, MLLMs still face challenges in judgment, including diverse biases, hallucinatory responses, and inconsistencies, even for advanced models such as GPT-4V. These findings emphasize the pressing need for enhancements and further research efforts regarding MLLMs as fully reliable evaluators. Code and dataset are available at this https URL .
- [1019] arXiv:2402.04836 (cross-list from cs.LG) [ pdf , ps , other ]
-
Title: On the Completeness of Invariant Geometric Deep Learning ModelsSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Abstract: Invariant models, one important class of geometric deep learning models, are capable of generating meaningful geometric representations by leveraging informative geometric features. These models are characterized by their simplicity, good experimental results and computational efficiency. However, their theoretical expressive power still remains unclear, restricting a deeper understanding of the potential of such models. In this work, we concentrate on characterizing the theoretical expressiveness of invariant models. We first rigorously bound the expressiveness of the most classical invariant model, Vanilla DisGNN (message passing neural networks incorporating distance), restricting its unidentifiable cases to be only those highly symmetric geometric graphs. To break these corner cases' symmetry, we introduce a simple yet E(3)-complete invariant design by nesting Vanilla DisGNN, named GeoNGNN. Leveraging GeoNGNN as a theoretical tool, we for the first time prove the E(3)-completeness of three well-established geometric models: DimeNet, GemNet and SphereNet. Our results fill the gap in the theoretical power of invariant models, contributing to a rigorous and comprehensive understanding of their capabilities. Experimentally, GeoNGNN exhibits good inductive bias in capturing local environments, and achieves competitive results w.r.t. complicated models relying on high-order invariant/equivariant representations while exhibiting significantly faster computational speed.
- [1020] arXiv:2402.04838 (cross-list from cs.CL) [ pdf , ps , html , other ]
-
Title: PaDeLLM-NER: Parallel Decoding in Large Language Models for Named Entity RecognitionSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI)
Abstract: In this study, we aim to reduce generation latency for Named Entity Recognition (NER) with Large Language Models (LLMs). The main cause of high latency in LLMs is the sequential decoding process, which autoregressively generates all labels and mentions for NER, significantly increase the sequence length. To this end, we introduce Parallel Decoding in LLM for NE} (PaDeLLM-NER), a approach that integrates seamlessly into existing generative model frameworks without necessitating additional modules or architectural modifications. PaDeLLM-NER allows for the simultaneous decoding of all mentions, thereby reducing generation latency. Experiments reveal that PaDeLLM-NER significantly increases inference speed that is 1.76 to 10.22 times faster than the autoregressive approach for both English and Chinese. Simultaneously it maintains the quality of predictions as evidenced by the performance that is on par with the state-of-the-art across various datasets.
- [1021] arXiv:2402.04869 (cross-list from cs.LG) [ pdf , ps , other ]
-
Title: Learning by Doing: An Online Causal Reinforcement Learning Framework with Causal-Aware PolicyRuichu Cai , Siyang Huang , Jie Qiao , Wei Chen , Yan Zeng , Keli Zhang , Fuchun Sun , Yang Yu , Zhifeng HaoSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Abstract: As a key component to intuitive cognition and reasoning solutions in human intelligence, causal knowledge provides great potential for reinforcement learning (RL) agents' interpretability towards decision-making by helping reduce the searching space. However, there is still a considerable gap in discovering and incorporating causality into RL, which hinders the rapid development of causal RL. In this paper, we consider explicitly modeling the generation process of states with the causal graphical model, based on which we augment the policy. We formulate the causal structure updating into the RL interaction process with active intervention learning of the environment. To optimize the derived objective, we propose a framework with theoretical performance guarantees that alternates between two steps: using interventions for causal structure learning during exploration and using the learned causal structure for policy guidance during exploitation. Due to the lack of public benchmarks that allow direct intervention in the state space, we design the root cause localization task in our simulated fault alarm environment and then empirically show the effectiveness and robustness of the proposed method against state-of-the-art baselines. Theoretical analysis shows that our performance improvement attributes to the virtuous cycle of causal-guided policy learning and causal structure learning, which aligns with our experimental results.
- [1022] arXiv:2402.04880 (cross-list from cs.DC) [ pdf , ps , html , other ]
-
Title: Combining Cloud and Mobile Computing for Machine LearningComments: Ruiqi Xu and Tianchi Zhang contributed equally to this workSubjects: Distributed, Parallel, and Cluster Computing (cs.DC) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: Although the computing power of mobile devices is increasing, machine learning models are also growing in size. This trend creates problems for mobile devices due to limitations like their memory capacity and battery life. While many services, like ChatGPT and Midjourney, run all the inferences in the cloud, we believe a flexible and fine-grained task distribution is more desirable. In this work, we consider model segmentation as a solution to improving the user experience, dividing the computation between mobile devices and the cloud in a way that offloads the compute-heavy portion of the model while minimizing the data transfer required. We show that the division not only reduces the wait time for users but can also be fine-tuned to optimize the workloads of the cloud. To achieve that, we design a scheduler that collects information about network quality, client device capability, and job requirements, making decisions to achieve consistent performance across a range of devices while reducing the work the cloud needs to perform.
- [1023] arXiv:2402.04882 (cross-list from cs.NE) [ pdf , ps , html , other ]
-
Title: LMUFormer: Low Complexity Yet Powerful Spiking Model With Legendre Memory UnitsComments: The 12th International Conference on Learning Representations (ICLR 2024)Subjects: Neural and Evolutionary Computing (cs.NE) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
Abstract: Transformer models have demonstrated high accuracy in numerous applications but have high complexity and lack sequential processing capability making them ill-suited for many streaming applications at the edge where devices are heavily resource-constrained. Thus motivated, many researchers have proposed reformulating the transformer models as RNN modules which modify the self-attention computation with explicit states. However, these approaches often incur significant performance degradation. The ultimate goal is to develop a model that has the following properties: parallel training, streaming and low-cost inference, and SOTA performance. In this paper, we propose a new direction to achieve this goal. We show how architectural modifications to a recurrent model can help push its performance toward Transformer models while retaining its sequential processing capability. Specifically, inspired by the recent success of Legendre Memory Units (LMU) in sequence learning tasks, we propose LMUFormer, which augments the LMU with convolutional patch embedding and convolutional channel mixer. Moreover, we present a spiking version of this architecture, which introduces the benefit of states within the patch embedding and channel mixer modules while simultaneously reducing the computing complexity. We evaluated our architectures on multiple sequence datasets. In comparison to SOTA transformer-based models within the ANN domain on the SCv2 dataset, our LMUFormer demonstrates comparable performance while necessitating a remarkable 53 times reduction in parameters and a substantial 65 times decrement in FLOPs. Additionally, owing to our model's proficiency in real-time data processing, we can achieve a 32.03% reduction in sequence length, all while incurring an inconsequential decline in performance. Our code is publicly available at this https URL .
- [1024] arXiv:2402.04885 (cross-list from stat.ML) [ pdf , ps , html , other ]
-
Title: A Unified Gaussian Process for Branching and Nested Hyperparameter OptimizationSubjects: Machine Learning (stat.ML) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: Choosing appropriate hyperparameters plays a crucial role in the success of neural networks as hyper-parameters directly control the behavior and performance of the training algorithms. To obtain efficient tuning, Bayesian optimization methods based on Gaussian process (GP) models are widely used. Despite numerous applications of Bayesian optimization in deep learning, the existing methodologies are developed based on a convenient but restrictive assumption that the tuning parameters are independent of each other. However, tuning parameters with conditional dependence are common in practice. In this paper, we focus on two types of them: branching and nested parameters. Nested parameters refer to those tuning parameters that exist only within a particular setting of another tuning parameter, and a parameter within which other parameters are nested is called a branching parameter. To capture the conditional dependence between branching and nested parameters, a unified Bayesian optimization framework is proposed. The sufficient conditions are rigorously derived to guarantee the validity of the kernel function, and the asymptotic convergence of the proposed optimization framework is proven under the continuum-armed-bandit setting. Based on the new GP model, which accounts for the dependent structure among input variables through a new kernel function, higher prediction accuracy and better optimization efficiency are observed in a series of synthetic simulations and real data applications of neural networks. Sensitivity analysis is also performed to provide insights into how changes in hyperparameter values affect prediction accuracy.
- [1025] arXiv:2402.04888 (cross-list from cs.IT) [ pdf , ps , other ]
-
Title: RSCNet: Dynamic CSI Compression for Cloud-based WiFi SensingComments: The paper has been accepted by IEEE International Conference on Communications (ICC) 2024Subjects: Information Theory (cs.IT) ; Artificial Intelligence (cs.AI); Human-Computer Interaction (cs.HC); Machine Learning (cs.LG); Signal Processing (eess.SP)
Abstract: WiFi-enabled Internet-of-Things (IoT) devices are evolving from mere communication devices to sensing instruments, leveraging Channel State Information (CSI) extraction capabilities. Nevertheless, resource-constrained IoT devices and the intricacies of deep neural networks necessitate transmitting CSI to cloud servers for sensing. Although feasible, this leads to considerable communication overhead. In this context, this paper develops a novel Real-time Sensing and Compression Network (RSCNet) which enables sensing with compressed CSI; thereby reducing the communication overheads. RSCNet facilitates optimization across CSI windows composed of a few CSI frames. Once transmitted to cloud servers, it employs Long Short-Term Memory (LSTM) units to harness data from prior windows, thus bolstering both the sensing accuracy and CSI reconstruction. RSCNet adeptly balances the trade-off between CSI compression and sensing precision, thus streamlining real-time cloud-based WiFi sensing with reduced communication costs. Numerical findings demonstrate the gains of RSCNet over the existing benchmarks like SenseFi, showcasing a sensing accuracy of 97.4% with minimal CSI reconstruction error. Numerical results also show a computational analysis of the proposed RSCNet as a function of the number of CSI frames.
- [1026] arXiv:2402.04918 (cross-list from cs.CL) [ pdf , ps , other ]
-
Title: Prompting Implicit Discourse Relation AnnotationComments: To appear at the Linguistic Annotation Workshop 2024Subjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI)
Abstract: Pre-trained large language models, such as ChatGPT, archive outstanding performance in various reasoning tasks without supervised training and were found to have outperformed crowdsourcing workers. Nonetheless, ChatGPT's performance in the task of implicit discourse relation classification, prompted by a standard multiple-choice question, is still far from satisfactory and considerably inferior to state-of-the-art supervised approaches. This work investigates several proven prompting techniques to improve ChatGPT's recognition of discourse relations. In particular, we experimented with breaking down the classification task that involves numerous abstract labels into smaller subtasks. Nonetheless, experiment results show that the inference accuracy hardly changes even with sophisticated prompt engineering, suggesting that implicit discourse relation classification is not yet resolvable under zero-shot or few-shot settings.
- [1027] arXiv:2402.04929 (cross-list from cs.CV) [ pdf , ps , html , other ]
-
Title: Source-Free Domain Adaptation with Diffusion-Guided Source Data GenerationComments: arXiv admin note: substantial text overlap with arXiv:2310.01701Subjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: This paper introduces a novel approach to leverage the generalizability capability of Diffusion Models for Source-Free Domain Adaptation (DM-SFDA). Our proposed DM-SFDA method involves fine-tuning a pre-trained text-to-image diffusion model to generate source domain images using features from the target images to guide the diffusion process. Specifically, the pre-trained diffusion model is fine-tuned to generate source samples that minimize entropy and maximize confidence for the pre-trained source model. We then apply established unsupervised domain adaptation techniques to align the generated source images with target domain data. We validate our approach through comprehensive experiments across a range of datasets, including Office-31, Office-Home, and VisDA. The results highlight significant improvements in SFDA performance, showcasing the potential of diffusion models in generating contextually relevant, domain-specific images.
- [1028] arXiv:2402.04955 (cross-list from cs.HC) [ pdf , ps , other ]
-
Title: Chatbots in Knowledge-Intensive Contexts: Comparing Intent and LLM-Based SystemsComments: 10 pages, 7 figures, under review at an ACM venueSubjects: Human-Computer Interaction (cs.HC) ; Artificial Intelligence (cs.AI)
Abstract: Cognitive assistants (CA) are chatbots that provide context-aware support to human workers in knowledge-intensive tasks. Traditionally, cognitive assistants respond in specific ways to predefined user intents and conversation patterns. However, this rigidness does not handle the diversity of natural language well. Recent advances in natural language processing (NLP), powering large language models (LLM) such as GPT-4, Llama2, and Gemini, could enable CAs to converse in a more flexible, human-like manner. However, the additional degrees of freedom may have unforeseen consequences, especially in knowledge-intensive contexts where accuracy is crucial. As a preliminary step to assessing the potential of using LLMs in these contexts, we conducted a user study comparing an LLM-based CA to an intent-based system regarding interaction efficiency, user experience, workload, and usability. This revealed that LLM-based CAs exhibited better user experience, task completion rate, usability, and perceived performance than intent-based systems, suggesting that switching NLP techniques should be investigated further.
- [1029] arXiv:2402.04967 (cross-list from cs.CL) [ pdf , ps , html , other ]
-
Title: Text or Image? What is More Important in Cross-Domain Generalization Capabilities of Hate Meme Detection Models?Comments: Accepted at EACL'2024 FindingsSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
Abstract: This paper delves into the formidable challenge of cross-domain generalization in multimodal hate meme detection, presenting compelling findings. We provide enough pieces of evidence supporting the hypothesis that only the textual component of hateful memes enables the existing multimodal classifier to generalize across different domains, while the image component proves highly sensitive to a specific training dataset. The evidence includes demonstrations showing that hate-text classifiers perform similarly to hate-meme classifiers in a zero-shot setting. Simultaneously, the introduction of captions generated from images of memes to the hate-meme classifier worsens performance by an average F1 of 0.02. Through blackbox explanations, we identify a substantial contribution of the text modality (average of 83%), which diminishes with the introduction of meme's image captions (52%). Additionally, our evaluation on a newly created confounder dataset reveals higher performance on text confounders as compared to image confounders with an average $\Delta$F1 of 0.18.
- [1030] arXiv:2402.04975 (cross-list from cs.HC) [ pdf , ps , other ]
-
Title: ChatScratch: An AI-Augmented System Toward Autonomous Visual Programming Learning for Children Aged 6-12Comments: 29 pages, 7 figures, accepted by CHI 2024Subjects: Human-Computer Interaction (cs.HC) ; Artificial Intelligence (cs.AI); Programming Languages (cs.PL)
Abstract: As Computational Thinking (CT) continues to permeate younger age groups in K-12 education, established CT platforms such as Scratch face challenges in catering to these younger learners, particularly those in the elementary school (ages 6-12). Through formative investigation with Scratch experts, we uncover three key obstacles to children's autonomous Scratch learning: artist's block in project planning, bounded creativity in asset creation, and inadequate coding guidance during implementation. To address these barriers, we introduce ChatScratch, an AI-augmented system to facilitate autonomous programming learning for young children. ChatScratch employs structured interactive storyboards and visual cues to overcome artist's block, integrates digital drawing and advanced image generation technologies to elevate creativity, and leverages Scratch-specialized Large Language Models (LLMs) for professional coding guidance. Our study shows that, compared to Scratch, ChatScratch efficiently fosters autonomous programming learning, and contributes to the creation of high-quality, personally meaningful Scratch projects for children.
- [1031] arXiv:2402.04978 (cross-list from cs.CL) [ pdf , ps , other ]
-
Title: An Enhanced Prompt-Based LLM Reasoning Scheme via Knowledge Graph-Integrated CollaborationSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI)
Abstract: While Large Language Models (LLMs) demonstrate exceptional performance in a multitude of Natural Language Processing (NLP) tasks, they encounter challenges in practical applications, including issues with hallucinations, inadequate knowledge updating, and limited transparency in the reasoning process. To overcome these limitations, this study innovatively proposes a collaborative training-free reasoning scheme involving tight cooperation between Knowledge Graph (KG) and LLMs. This scheme first involves using LLMs to iteratively explore KG, selectively retrieving a task-relevant knowledge subgraph to support reasoning. The LLMs are then guided to further combine inherent implicit knowledge to reason on the subgraph while explicitly elucidating the reasoning process. Through such a cooperative approach, our scheme achieves more reliable knowledge-based reasoning and facilitates the tracing of the reasoning results. Experimental results show that our scheme significantly progressed across multiple datasets, notably achieving over a 10% improvement on the QALD10 dataset compared to the best baseline and the fine-tuned state-of-the-art (SOTA) work. Building on this success, this study hopes to offer a valuable reference for future research in the fusion of KG and LLMs, thereby enhancing LLMs' proficiency in solving complex issues.
- [1032] arXiv:2402.04979 (cross-list from cs.CV) [ pdf , ps , html , other ]
-
Title: Detection and Pose Estimation of flat, Texture-less Industry Objects on HoloLens using synthetic TrainingComments: Scandinavian Conference on Image Analysis 2023Journal-ref: In Scandinavian Conference on Image Analysis 2023 (pp. 569-585). Cham: Springer Nature SwitzerlandSubjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI)
Abstract: Current state-of-the-art 6d pose estimation is too compute intensive to be deployed on edge devices, such as Microsoft HoloLens (2) or Apple iPad, both used for an increasing number of augmented reality applications. The quality of AR is greatly dependent on its capabilities to detect and overlay geometry within the scene. We propose a synthetically trained client-server-based augmented reality application, demonstrating state-of-the-art object pose estimation of metallic and texture-less industry objects on edge devices. Synthetic data enables training without real photographs, i.e. for yet-to-be-manufactured objects. Our qualitative evaluation on an AR-assisted sorting task, and quantitative evaluation on both renderings, as well as real-world data recorded on HoloLens 2, sheds light on its real-world applicability.
- [1033] arXiv:2402.05007 (cross-list from cs.LG) [ pdf , ps , other ]
-
Title: Example-based Explanations for Random Forests using Machine UnlearningSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Abstract: Tree-based machine learning models, such as decision trees and random forests, have been hugely successful in classification tasks primarily because of their predictive power in supervised learning tasks and ease of interpretation. Despite their popularity and power, these models have been found to produce unexpected or discriminatory outcomes. Given their overwhelming success for most tasks, it is of interest to identify sources of their unexpected and discriminatory behavior. However, there has not been much work on understanding and debugging tree-based classifiers in the context of fairness.
We introduce FairDebugger, a system that utilizes recent advances in machine unlearning research to identify training data subsets responsible for instances of fairness violations in the outcomes of a random forest classifier. FairDebugger generates top-$k$ explanations (in the form of coherent training data subsets) for model unfairness. Toward this goal, FairDebugger first utilizes machine unlearning to estimate the change in the tree structures of the random forest when parts of the underlying training data are removed, and then leverages the Apriori algorithm from frequent itemset mining to reduce the subset search space. We empirically evaluate our approach on three real-world datasets, and demonstrate that the explanations generated by FairDebugger are consistent with insights from prior studies on these datasets. - [1034] arXiv:2402.05008 (cross-list from cs.CV) [ pdf , ps , other ]
-
Title: EfficientViT-SAM: Accelerated Segment Anything Model Without Performance LossComments: tech reportSubjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: We present EfficientViT-SAM, a new family of accelerated segment anything models. We retain SAM's lightweight prompt encoder and mask decoder while replacing the heavy image encoder with EfficientViT. For the training, we begin with the knowledge distillation from the SAM-ViT-H image encoder to EfficientViT. Subsequently, we conduct end-to-end training on the SA-1B dataset. Benefiting from EfficientViT's efficiency and capacity, EfficientViT-SAM delivers 48.9x measured TensorRT speedup on A100 GPU over SAM-ViT-H without sacrificing performance. Our code and pre-trained models are released at this https URL .
- [1035] arXiv:2402.05027 (cross-list from cs.MA) [ pdf , ps , other ]
-
Title: Towards Generalizability of Multi-Agent Reinforcement Learning in Graphs with Recurrent Message PassingComments: Accepted at AAMAS 2024, version with appendix; revised sections 1 and 7, corrected table 1, final results unchangedSubjects: Multiagent Systems (cs.MA) ; Artificial Intelligence (cs.AI)
Abstract: Graph-based environments pose unique challenges to multi-agent reinforcement learning. In decentralized approaches, agents operate within a given graph and make decisions based on partial or outdated observations. The size of the observed neighborhood limits the generalizability to different graphs and affects the reactivity of agents, the quality of the selected actions, and the communication overhead. This work focuses on generalizability and resolves the trade-off in observed neighborhood size with a continuous information flow in the whole graph. We propose a recurrent message-passing model that iterates with the environment's steps and allows nodes to create a global representation of the graph by exchanging messages with their neighbors. Agents receive the resulting learned graph observations based on their location in the graph. Our approach can be used in a decentralized manner at runtime and in combination with a reinforcement learning algorithm of choice. We evaluate our method across 1000 diverse graphs in the context of routing in communication networks and find that it enables agents to generalize and adapt to changes in the graph.
- [1036] arXiv:2402.05044 (cross-list from cs.CL) [ pdf , ps , html , other ]
-
Title: SALAD-Bench: A Hierarchical and Comprehensive Safety Benchmark for Large Language ModelsComments: fix institution typoSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI); Cryptography and Security (cs.CR); Machine Learning (cs.LG)
Abstract: In the rapidly evolving landscape of Large Language Models (LLMs), ensuring robust safety measures is paramount. To meet this crucial need, we propose \emph{SALAD-Bench}, a safety benchmark specifically designed for evaluating LLMs, attack, and defense methods. Distinguished by its breadth, SALAD-Bench transcends conventional benchmarks through its large scale, rich diversity, intricate taxonomy spanning three levels, and versatile functionalities.SALAD-Bench is crafted with a meticulous array of questions, from standard queries to complex ones enriched with attack, defense modifications and multiple-choice. To effectively manage the inherent complexity, we introduce an innovative evaluators: the LLM-based MD-Judge for QA pairs with a particular focus on attack-enhanced queries, ensuring a seamless, and reliable evaluation. Above components extend SALAD-Bench from standard LLM safety evaluation to both LLM attack and defense methods evaluation, ensuring the joint-purpose utility. Our extensive experiments shed light on the resilience of LLMs against emerging threats and the efficacy of contemporary defense tactics. Data and evaluator are released under this https URL .
- [1037] arXiv:2402.05106 (cross-list from cs.CV) [ pdf , ps , other ]
-
Title: Image captioning for Brazilian Portuguese using GRIT modelComments: arXiv admin note: text overlap with arXiv:2207.09666 by other authorsSubjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
Abstract: This work presents the early development of a model of image captioning for the Brazilian Portuguese language. We used the GRIT (Grid - and Region-based Image captioning Transformer) model to accomplish this work. GRIT is a Transformer-only neural architecture that effectively utilizes two visual features to generate better captions. The GRIT method emerged as a proposal to be a more efficient way to generate image captioning. In this work, we adapt the GRIT model to be trained in a Brazilian Portuguese dataset to have an image captioning method for the Brazilian Portuguese Language.
- [1038] arXiv:2402.05111 (cross-list from cs.CL) [ pdf , ps , other ]
-
Title: Edu-ConvoKit: An Open-Source Library for Education Conversation DataSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI)
Abstract: We introduce Edu-ConvoKit, an open-source library designed to handle pre-processing, annotation and analysis of conversation data in education. Resources for analyzing education conversation data are scarce, making the research challenging to perform and therefore hard to access. We address these challenges with Edu-ConvoKit. Edu-ConvoKit is open-source ( this https URL ), pip-installable ( this https URL ), with comprehensive documentation ( this https URL ). Our demo video is available at: this https URL . We include additional resources, such as Colab applications of Edu-ConvoKit to three diverse education datasets and a repository of Edu-ConvoKit related papers, that can be found in our GitHub repository.
- [1039] arXiv:2402.05115 (cross-list from cs.RO) [ pdf , ps , html , other ]
-
Title: Unsupervised Motion Retargeting for Human-Robot ImitationLouis Annabi (Flowers, U2IS), Ziqi Ma (U2IS), Sao Mai Nguyen (Lab-STICC_RAMBO, U2IS, Flowers, IMT Atlantique - INFO)Comments: Companion of the 2024 ACM/IEEE International Conference on Human-Robot Interactio, Mar 2024, Boulder (CO), United StatesSubjects: Robotics (cs.RO) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: This early-stage research work aims to improve online human-robot imitation by translating sequences of joint positions from the domain of human motions to a domain of motions achievable by a given robot, thus constrained by its embodiment. Leveraging the generalization capabilities of deep learning methods, we address this problem by proposing an encoder-decoder neural network model performing domain-to-domain translation. In order to train such a model, one could use pairs of associated robot and human motions. Though, such paired data is extremely rare in practice, and tedious to collect. Therefore, we turn towards deep learning methods for unpaired domain-to-domain translation, that we adapt in order to perform human-robot imitation.
- [1040] arXiv:2402.05119 (cross-list from cs.CL) [ pdf , ps , html , other ]
-
Title: A Closer Look at the Limitations of Instruction TuningSreyan Ghosh , Chandra Kiran Reddy Evuru , Sonal Kumar , Ramaneswaran S , Deepali Aneja , Zeyu Jin , Ramani Duraiswami , Dinesh ManochaSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI)
Abstract: Instruction Tuning (IT), the process of training large language models (LLMs) using instruction-response pairs, has emerged as the predominant method for transforming base pre-trained LLMs into open-domain conversational agents. While IT has achieved notable success and widespread adoption, its limitations and shortcomings remain underexplored. In this paper, through rigorous experiments and an in-depth analysis of the changes LLMs undergo through IT, we reveal various limitations of IT. In particular, we show that (1) IT fails to enhance knowledge or skills in LLMs. LoRA fine-tuning is limited to learning response initiation and style tokens, and full-parameter fine-tuning leads to knowledge degradation. (2) Copying response patterns from IT datasets derived from knowledgeable sources leads to a decline in response quality. (3) Full-parameter fine-tuning increases hallucination by inaccurately borrowing tokens from conceptually similar instances in the IT dataset for generating responses. (4) Popular methods to improve IT do not lead to performance improvements over a simple LoRA fine-tuned model. Our findings reveal that responses generated solely from pre-trained knowledge consistently outperform responses by models that learn any form of new knowledge from IT on open-source datasets. We hope the insights and challenges revealed inspire future work.
- [1041] arXiv:2402.05120 (cross-list from cs.CL) [ pdf , ps , html , other ]
-
Title: More Agents Is All You NeedSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: We find that, simply via a sampling-and-voting method, the performance of large language models (LLMs) scales with the number of agents instantiated. Also, this method is orthogonal to existing complicated methods to further enhance LLMs, while the degree of enhancement is correlated to the task difficulty. We conduct comprehensive experiments on a wide range of LLM benchmarks to verify the presence of our finding, and to study the properties that can facilitate its occurrence. Our code is publicly available at: \url{https://anonymous.4open.science/r/more_agent_is_all_you_need}.
- [1042] arXiv:2402.05125 (cross-list from cs.CL) [ pdf , ps , html , other ]
-
Title: Zero-Shot Clinical Trial Patient Matching with LLMsSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI)
Abstract: Matching patients to clinical trials is a key unsolved challenge in bringing new drugs to market. Today, identifying patients who meet a trial's eligibility criteria is highly manual, taking up to 1 hour per patient. Automated screening is challenging, however, as it requires understanding unstructured clinical text. Large language models (LLMs) offer a promising solution. In this work, we explore their application to trial matching. First, we design an LLM-based system which, given a patient's medical history as unstructured clinical text, evaluates whether that patient meets a set of inclusion criteria (also specified as free text). Our zero-shot system achieves state-of-the-art scores on the n2c2 2018 cohort selection benchmark. Second, we improve the data and cost efficiency of our method by identifying a prompting strategy which matches patients an order of magnitude faster and more cheaply than the status quo, and develop a two-stage retrieval pipeline that reduces the number of tokens processed by up to a third while retaining high performance. Third, we evaluate the interpretability of our system by having clinicians evaluate the natural language justifications generated by the LLM for each eligibility decision, and show that it can output coherent explanations for 97% of its correct decisions and 75% of its incorrect ones. Our results establish the feasibility of using LLMs to accelerate clinical trial operations.
- [1043] arXiv:2402.05127 (cross-list from cs.CL) [ pdf , ps , other ]
-
Title: Illuminate: A novel approach for depression detection with explainable analysis and proactive therapy using prompt engineeringComments: 10 pages, 9 figures, 9 tablesSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: This paper introduces a novel paradigm for depression detection and treatment using advanced Large Language Models (LLMs): Generative Pre-trained Transformer 4 (GPT-4), Llama 2 chat, and Gemini. These LLMs are fine-tuned with specialized prompts to diagnose, explain, and suggest therapeutic interventions for depression. A unique few-shot prompting method enhances the models' ability to analyze and explain depressive symptoms based on the DSM-5 criteria. In the interaction phase, the models engage in empathetic dialogue management, drawing from resources like PsychDB and a Cognitive Behavioral Therapy (CBT) Guide, fostering supportive interactions with individuals experiencing major depressive disorders. Additionally, the research introduces the Illuminate Database, enriched with various CBT modules, aiding in personalized therapy recommendations. The study evaluates LLM performance using metrics such as F1 scores, Precision, Recall, Cosine similarity, and Recall-Oriented Understudy for Gisting Evaluation (ROUGE) across different test sets, demonstrating their effectiveness. This comprehensive approach blends cutting-edge AI with established psychological methods, offering new possibilities in mental health care and showcasing the potential of LLMs in revolutionizing depression diagnosis and treatment strategies.
- [1044] arXiv:2402.05128 (cross-list from cs.CL) [ pdf , ps , other ]
-
Title: Enhancing Textbook Question Answering Task with Large Language Models and Retrieval Augmented GenerationSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI)
Abstract: Textbook question answering (TQA) is a challenging task in artificial intelligence due to the complex nature of context and multimodal data. Although previous research has significantly improved the task, there are still some limitations including the models' weak reasoning and inability to capture contextual information in the lengthy context. The introduction of large language models (LLMs) has revolutionized the field of AI, however, directly applying LLMs often leads to inaccurate answers. This paper proposes a methodology that handle the out-of-domain scenario in TQA where concepts are spread across different lessons by incorporating the retrieval augmented generation (RAG) technique and utilize transfer learning to handle the long context and enhance reasoning abilities. Through supervised fine-tuning of the LLM model Llama-2 and the incorporation of RAG, our architecture outperforms the baseline, achieving a 4.12% accuracy improvement on validation set and 9.84% on test set for non-diagram multiple-choice questions.
- [1045] arXiv:2402.05130 (cross-list from cs.CL) [ pdf , ps , html , other ]
-
Title: LB-KBQA: Large-language-model and BERT based Knowledge-Based Question and Answering SystemSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI)
Abstract: Generative Artificial Intelligence (AI), because of its emergent abilities, has empowered various fields, one typical of which is large language models (LLMs). One of the typical application fields of Generative AI is large language models (LLMs), and the natural language understanding capability of LLM is dramatically improved when compared with conventional AI-based methods. The natural language understanding capability has always been a barrier to the intent recognition performance of the Knowledge-Based-Question-and-Answer (KBQA) system, which arises from linguistic diversity and the newly appeared intent. Conventional AI-based methods for intent recognition can be divided into semantic parsing-based and model-based approaches. However, both of the methods suffer from limited resources in intent recognition. To address this issue, we propose a novel KBQA system based on a Large Language Model(LLM) and BERT (LB-KBQA). With the help of generative AI, our proposed method could detect newly appeared intent and acquire new knowledge. In experiments on financial domain question answering, our model has demonstrated superior effectiveness.
- [1046] arXiv:2402.05133 (cross-list from cs.CL) [ pdf , ps , html , other ]
-
Title: Personalized Language Modeling from Personalized Human FeedbackSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: Reinforcement Learning from Human Feedback (RLHF) is the current dominating framework to fine-tune large language models to better align with human preferences. However, the underlying premise of algorithms developed under this framework can be problematic when user preferences encoded in human feedback are diverse. In this work, we aim to address this problem by developing methods for building personalized language models. We first formally introduce the task of learning from personalized human feedback and explain why vanilla RLHF can be problematic in this context. We then propose a general Personalized-RLHF (P-RLHF) framework, which requires one to jointly learn a user model and a language (or reward) model. The user model takes in user information and outputs user representations. Its structure encodes our assumptions about user preferences underlying the feedback data. We develop new learning objectives for personalized reward modeling and personalized Direct Preference Optimization. To demonstrate the efficacy of our method, we test it on real-world text summarization data with annotated preferences and annotator information. We fine-tune GPT-J 6B to obtain personalized language (and reward) models, which outperform non-personalized models in terms of aligning with individual preferences.
- [1047] arXiv:2402.05140 (cross-list from cs.LG) [ pdf , ps , html , other ]
-
Title: Tag-LLM: Repurposing General-Purpose LLMs for Specialized DomainsSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
Abstract: Large Language Models (LLMs) have demonstrated remarkable proficiency in understanding and generating natural language. However, their capabilities wane in highly specialized domains underrepresented in the pretraining corpus, such as physical and biomedical sciences. This work explores how to repurpose general LLMs into effective task solvers for specialized domains. We introduce a novel, model-agnostic framework for learning custom input tags, which are parameterized as continuous vectors appended to the LLM's embedding layer, to condition the LLM. We design two types of input tags: domain tags are used to delimit specialized representations (e.g., chemical formulas) and provide domain-relevant context; function tags are used to represent specific functions (e.g., predicting molecular properties) and compress function-solving instructions. We develop a three-stage protocol to learn these tags using auxiliary data and domain knowledge. By explicitly disentangling task domains from task functions, our method enables zero-shot generalization to unseen problems through diverse combinations of the input tags. It also boosts LLM's performance in various specialized domains, such as predicting protein or chemical properties and modeling drug-target interactions, outperforming expert models tailored to these tasks.
- [1048] arXiv:2402.05142 (cross-list from cs.SE) [ pdf , ps , other ]
-
Title: The Foundations of Computational Management: A Systematic Approach to Task Automation for the Integration of Artificial Intelligence into Existing WorkflowsComments: 29 pages, 3 appendicesSubjects: Software Engineering (cs.SE) ; Artificial Intelligence (cs.AI)
Abstract: Driven by the rapid ascent of artificial intelligence (AI), organizations are at the epicenter of a seismic shift, facing a crucial question: How can AI be successfully integrated into existing operations? To help answer it, manage expectations and mitigate frustration, this article introduces Computational Management, a systematic approach to task automation for enhancing the ability of organizations to harness AI's potential within existing workflows. Computational Management acts as a bridge between the strategic insights of management science with the analytical rigor of computational thinking. The article offers three easy step-by-step procedures to begin the process of implementing AI within a workflow. Such procedures focus on task (re)formulation, on the assessment of the automation potential of tasks, on the completion of task specification templates for AI selection and adaptation. Included in the article there are manual and automated methods, with prompt suggestions for publicly available LLMs, to complete these three procedures. The first procedure, task (re)formulation, focuses on breaking down work activities into basic units, so they can be completed by one agent, involve a single well-defined action, and produce a distinct outcome. The second, allows the assessment of the granular task and its suitability for automation, using the Task Automation Index to rank tasks based on whether they have standardized input, well-defined rules, repetitiveness, data dependency, and objective outputs. The third, focuses on a task specification template which details information on 16 critical components of tasks, and can be used as a checklist to select or adapt the most suitable AI solution for integration into existing workflows. Computational Management provides a roadmap and a toolkit for humans and AI to thrive together, while enhancing organizational efficiency and innovation.
- [1049] arXiv:2402.05144 (cross-list from cs.NE) [ pdf , ps , html , other ]
-
Title: A Bandit Approach with Evolutionary Operators for Model SelectionMargaux Brégère (LPSM (UMR_8001), EDF R&D), Julie Keisler (CRIStAL, EDF R&D)Subjects: Neural and Evolutionary Computing (cs.NE) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Optimization and Control (math.OC)
Abstract: This paper formulates model selection as an infinite-armed bandit problem. The models are arms, and picking an arm corresponds to a partial training of the model (resource allocation). The reward is the accuracy of the selected model after its partial training. In this best arm identification problem, regret is the gap between the expected accuracy of the optimal model and that of the model finally chosen. We first consider a straightforward generalization of UCB-E to the stochastic infinite-armed bandit problem and show that, under basic assumptions, the expected regret order is $T^{-\alpha}$ for some $\alpha \in (0,1/5)$ and $T$ the number of resources to allocate. From this vanilla algorithm, we introduce the algorithm Mutant-UCB that incorporates operators from evolutionary algorithms. Tests carried out on three open source image classification data sets attest to the relevance of this novel combining approach, which outperforms the state-of-the-art for a fixed budget.
- [1050] arXiv:2402.05146 (cross-list from cs.LG) [ pdf , ps , html , other ]
-
Title: Compressing Deep Reinforcement Learning Networks with a Dynamic Structured Pruning Method for Autonomous DrivingSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Robotics (cs.RO)
Abstract: Deep reinforcement learning (DRL) has shown remarkable success in complex autonomous driving scenarios. However, DRL models inevitably bring high memory consumption and computation, which hinders their wide deployment in resource-limited autonomous driving devices. Structured Pruning has been recognized as a useful method to compress and accelerate DRL models, but it is still challenging to estimate the contribution of a parameter (i.e., neuron) to DRL models. In this paper, we introduce a novel dynamic structured pruning approach that gradually removes a DRL model's unimportant neurons during the training stage. Our method consists of two steps, i.e. training DRL models with a group sparse regularizer and removing unimportant neurons with a dynamic pruning threshold. To efficiently train the DRL model with a small number of important neurons, we employ a neuron-importance group sparse regularizer. In contrast to conventional regularizers, this regularizer imposes a penalty on redundant groups of neurons that do not significantly influence the output of the DRL model. Furthermore, we design a novel structured pruning strategy to dynamically determine the pruning threshold and gradually remove unimportant neurons with a binary mask. Therefore, our method can remove not only redundant groups of neurons of the DRL model but also achieve high and robust performance. Experimental results show that the proposed method is competitive with existing DRL pruning methods on discrete control environments (i.e., CartPole-v1 and LunarLander-v2) and MuJoCo continuous environments (i.e., Hopper-v3 and Walker2D-v3). Specifically, our method effectively compresses $93\%$ neurons and $96\%$ weights of the DRL model in four challenging DRL environments with slight accuracy degradation.
- [1051] arXiv:2402.05148 (cross-list from cs.SY) [ pdf , ps , other ]
-
Title: Cost Optimized Scheduling in Modular Electrolysis PlantsSubjects: Systems and Control (eess.SY) ; Artificial Intelligence (cs.AI)
Abstract: In response to the global shift towards renewable energy resources, the production of green hydrogen through electrolysis is emerging as a promising solution. Modular electrolysis plants, designed for flexibility and scalability, offer a dynamic response to the increasing demand for hydrogen while accommodating the fluctuations inherent in renewable energy sources. However, optimizing their operation is challenging, especially when a large number of electrolysis modules needs to be coordinated, each with potentially different characteristics.
To address these challenges, this paper presents a decentralized scheduling model to optimize the operation of modular electrolysis plants using the Alternating Direction Method of Multipliers. The model aims to balance hydrogen production with fluctuating demand, to minimize the marginal Levelized Cost of Hydrogen (mLCOH), and to ensure adaptability to operational disturbances. A case study validates the accuracy of the model in calculating mLCOH values under nominal load conditions and demonstrates its responsiveness to dynamic changes, such as electrolyzer module malfunctions and scale-up scenarios. - [1052] arXiv:2402.05149 (cross-list from cs.LG) [ pdf , ps , html , other ]
-
Title: FlowPG: Action-constrained Policy Gradient with Normalizing FlowsJournal-ref: Thirty-seventh Conference on Neural Information Processing Systems. 2023Subjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Abstract: Action-constrained reinforcement learning (ACRL) is a popular approach for solving safety-critical and resource-allocation related decision making problems. A major challenge in ACRL is to ensure agent taking a valid action satisfying constraints in each RL step. Commonly used approach of using a projection layer on top of the policy network requires solving an optimization program which can result in longer training time, slow convergence, and zero gradient problem. To address this, first we use a normalizing flow model to learn an invertible, differentiable mapping between the feasible action space and the support of a simple distribution on a latent variable, such as Gaussian. Second, learning the flow model requires sampling from the feasible action space, which is also challenging. We develop multiple methods, based on Hamiltonian Monte-Carlo and probabilistic sentential decision diagrams for such action sampling for convex and non-convex constraints. Third, we integrate the learned normalizing flow with the DDPG algorithm. By design, a well-trained normalizing flow will transform policy output into a valid action without requiring an optimization solver. Empirically, our approach results in significantly fewer constraint violations (upto an order-of-magnitude for several instances) and is multiple times faster on a variety of continuous control tasks.
- [1053] arXiv:2402.05151 (cross-list from cs.LG) [ pdf , ps , html , other ]
-
Title: CrashFormer: A Multimodal Architecture to Predict the Risk of CrashAmin Karimi Monsefi , Pouya Shiri , Ahmad Mohammadshirazi , Nastaran Karimi Monsefi , Ron Davies , Sobhan Moosavi , Rajiv RamnathComments: The paper is accepted In 1st ACM SIGSPATIAL International Workshop on Advances in Urban-AI (UrbanAI 23), November 13, 2023, Hamburg, GermanySubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Abstract: Reducing traffic accidents is a crucial global public safety concern. Accident prediction is key to improving traffic safety, enabling proactive measures to be taken before a crash occurs, and informing safety policies, regulations, and targeted interventions. Despite numerous studies on accident prediction over the past decades, many have limitations in terms of generalizability, reproducibility, or feasibility for practical use due to input data or problem formulation. To address existing shortcomings, we propose CrashFormer, a multi-modal architecture that utilizes comprehensive (but relatively easy to obtain) inputs such as the history of accidents, weather information, map images, and demographic information. The model predicts the future risk of accidents on a reasonably acceptable cadence (i.e., every six hours) for a geographical location of 5.161 square kilometers. CrashFormer is composed of five components: a sequential encoder to utilize historical accidents and weather data, an image encoder to use map imagery data, a raw data encoder to utilize demographic information, a feature fusion module for aggregating the encoded features, and a classifier that accepts the aggregated data and makes predictions accordingly. Results from extensive real-world experiments in 10 major US cities show that CrashFormer outperforms state-of-the-art sequential and non-sequential models by 1.8% in F1-score on average when using ``sparse'' input data.
- [1054] arXiv:2402.05154 (cross-list from cs.SI) [ pdf , ps , html , other ]
-
Title: Adaptive Hypergraph Network for Trust PredictionSubjects: Social and Information Networks (cs.SI) ; Artificial Intelligence (cs.AI)
Abstract: Trust plays an essential role in an individual's decision-making. Traditional trust prediction models rely on pairwise correlations to infer potential relationships between users. However, in the real world, interactions between users are usually complicated rather than pairwise only. Hypergraphs offer a flexible approach to modeling these complex high-order correlations (not just pairwise connections), since hypergraphs can leverage hyperedeges to link more than two nodes. However, most hypergraph-based methods are generic and cannot be well applied to the trust prediction task. In this paper, we propose an Adaptive Hypergraph Network for Trust Prediction (AHNTP), a novel approach that improves trust prediction accuracy by using higher-order correlations. AHNTP utilizes Motif-based PageRank to capture high-order social influence information. In addition, it constructs hypergroups from both node-level and structure-level attributes to incorporate complex correlation information. Furthermore, AHNTP leverages adaptive hypergraph Graph Convolutional Network (GCN) layers and multilayer perceptrons (MLPs) to generate comprehensive user embeddings, facilitating trust relationship prediction. To enhance model generalization and robustness, we introduce a novel supervised contrastive learning loss for optimization. Extensive experiments demonstrate the superiority of our model over the state-of-the-art approaches in terms of trust prediction accuracy. The source code of this work can be accessed via this https URL .
- [1055] arXiv:2402.05156 (cross-list from cs.DL) [ pdf , ps , html , other ]
-
Title: What About the Data? A Mapping Study on Data Engineering for AI SystemsComments: Preprint, accepted for CAIN24Subjects: Digital Libraries (cs.DL) ; Artificial Intelligence (cs.AI); Databases (cs.DB)
Abstract: AI systems cannot exist without data. Now that AI models (data science and AI) have matured and are readily available to apply in practice, most organizations struggle with the data infrastructure to do so. There is a growing need for data engineers that know how to prepare data for AI systems or that can setup enterprise-wide data architectures for analytical projects. But until now, the data engineering part of AI engineering has not been getting much attention, in favor of discussing the modeling part. In this paper we aim to change this by perform a mapping study on data engineering for AI systems, i.e., AI data engineering. We found 25 relevant papers between January 2019 and June 2023, explaining AI data engineering activities. We identify which life cycle phases are covered, which technical solutions or architectures are proposed and which lessons learned are presented. We end by an overall discussion of the papers with implications for practitioners and researchers. This paper creates an overview of the body of knowledge on data engineering for AI. This overview is useful for practitioners to identify solutions and best practices as well as for researchers to identify gaps.
- [1056] arXiv:2402.05158 (cross-list from cs.CV) [ pdf , ps , html , other ]
-
Title: Enhancement of Bengali OCR by Specialized Models and Advanced Techniques for Diverse Document TypesComments: 8 pages, 7 figures, 4 table Link of the paper this https URLJournal-ref: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) Workshops, 2024, pp. 1102-1109Subjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: This research paper presents a unique Bengali OCR system with some capabilities. The system excels in reconstructing document layouts while preserving structure, alignment, and images. It incorporates advanced image and signature detection for accurate extraction. Specialized models for word segmentation cater to diverse document types, including computer-composed, letterpress, typewriter, and handwritten documents. The system handles static and dynamic handwritten inputs, recognizing various writing styles. Furthermore, it has the ability to recognize compound characters in Bengali. Extensive data collection efforts provide a diverse corpus, while advanced technical components optimize character and word recognition. Additional contributions include image, logo, signature and table recognition, perspective correction, layout reconstruction, and a queuing module for efficient and scalable processing. The system demonstrates outstanding performance in efficient and accurate text extraction and analysis.
- [1057] arXiv:2402.05160 (cross-list from cs.SE) [ pdf , ps , html , other ]
-
Title: What's documented in AI? Systematic Analysis of 32K AI Model CardsWeixin Liang , Nazneen Rajani , Xinyu Yang , Ezinwanne Ozoani , Eric Wu , Yiqun Chen , Daniel Scott Smith , James ZouSubjects: Software Engineering (cs.SE) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: The rapid proliferation of AI models has underscored the importance of thorough documentation, as it enables users to understand, trust, and effectively utilize these models in various applications. Although developers are encouraged to produce model cards, it's not clear how much information or what information these cards contain. In this study, we conduct a comprehensive analysis of 32,111 AI model documentations on Hugging Face, a leading platform for distributing and deploying AI models. Our investigation sheds light on the prevailing model card documentation practices. Most of the AI models with substantial downloads provide model cards, though the cards have uneven informativeness. We find that sections addressing environmental impact, limitations, and evaluation exhibit the lowest filled-out rates, while the training section is the most consistently filled-out. We analyze the content of each section to characterize practitioners' priorities. Interestingly, there are substantial discussions of data, sometimes with equal or even greater emphasis than the model itself. To evaluate the impact of model cards, we conducted an intervention study by adding detailed model cards to 42 popular models which had no or sparse model cards previously. We find that adding model cards is moderately correlated with an increase weekly download rates. Our study opens up a new perspective for analyzing community norms and practices for model documentation through large-scale data science and linguistics analysis.
- [1058] arXiv:2402.05162 (cross-list from cs.LG) [ pdf , ps , html , other ]
-
Title: Assessing the Brittleness of Safety Alignment via Pruning and Low-Rank ModificationsBoyi Wei , Kaixuan Huang , Yangsibo Huang , Tinghao Xie , Xiangyu Qi , Mengzhou Xia , Prateek Mittal , Mengdi Wang , Peter HendersonComments: 22 pages, 9 figures. Project page is available at this https URLSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
Abstract: Large language models (LLMs) show inherent brittleness in their safety mechanisms, as evidenced by their susceptibility to jailbreaking and even non-malicious fine-tuning. This study explores this brittleness of safety alignment by leveraging pruning and low-rank modifications. We develop methods to identify critical regions that are vital for safety guardrails, and that are disentangled from utility-relevant regions at both the neuron and rank levels. Surprisingly, the isolated regions we find are sparse, comprising about $3\%$ at the parameter level and $2.5\%$ at the rank level. Removing these regions compromises safety without significantly impacting utility, corroborating the inherent brittleness of the model's safety mechanisms. Moreover, we show that LLMs remain vulnerable to low-cost fine-tuning attacks even when modifications to the safety-critical regions are restricted. These findings underscore the urgent need for more robust safety strategies in LLMs.
- [1059] arXiv:2402.05164 (cross-list from cs.LG) [ pdf , ps , html , other ]
-
Title: A Resource Model For Neural Scaling LawComments: 10 pages, 8 figures, Under review as a workshop paper at ICLR 2024Subjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Neural and Evolutionary Computing (cs.NE)
Abstract: Neural scaling laws characterize how model performance improves as the model size scales up. Inspired by empirical observations, we introduce a resource model of neural scaling. A task is usually composite hence can be decomposed into many subtasks, which compete for resources (measured by the number of neurons allocated to subtasks). On toy problems, we empirically find that: (1) The loss of a subtask is inversely proportional to its allocated neurons. (2) When multiple subtasks are present in a composite task, the resources acquired by each subtask uniformly grow as models get larger, keeping the ratios of acquired resources constants. We hypothesize these findings to be generally true and build a model to predict neural scaling laws for general composite tasks, which successfully replicates the neural scaling law of Chinchilla models reported in arXiv:2203.15556 . We believe that the notion of resource used in this paper will be a useful tool for characterizing and diagnosing neural networks.
- [1060] arXiv:2402.05188 (cross-list from cs.RO) [ pdf , ps , html , other ]
-
Title: InCoRo: In-Context Learning for Robotics Control with Feedback LoopsSubjects: Robotics (cs.RO) ; Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
Abstract: One of the challenges in robotics is to enable robotic units with the reasoning capability that would be robust enough to execute complex tasks in dynamic environments. Recent advances in LLMs have positioned them as go-to tools for simple reasoning tasks, motivating the pioneering work of Liang et al. [35] that uses an LLM to translate natural language commands into low-level static execution plans for robotic units. Using LLMs inside robotics systems brings their generalization to a new level, enabling zero-shot generalization to new tasks. This paper extends this prior work to dynamic environments. We propose InCoRo, a system that uses a classical robotic feedback loop composed of an LLM controller, a scene understanding unit, and a robot. Our system continuously analyzes the state of the environment and provides adapted execution commands, enabling the robot to adjust to changing environmental conditions and correcting for controller errors. Our system does not require any iterative optimization to learn to accomplish a task as it leverages in-context learning with an off-the-shelf LLM model. Through an extensive validation process involving two standardized industrial robotic units -- SCARA and DELTA types -- we contribute knowledge about these robots, not popular in the community, thereby enriching it. We highlight the generalization capabilities of our system and show that (1) in-context learning in combination with the current state-of-the-art LLMs is an effective way to implement a robotic controller; (2) in static environments, InCoRo surpasses the prior art in terms of the success rate; (3) in dynamic environments, we establish new state-of-the-art for the SCARA and DELTA units, respectively. This research paves the way towards building reliable, efficient, intelligent autonomous systems that adapt to dynamic environments.
- [1061] arXiv:2402.05200 (cross-list from cond-mat.mtrl-sci) [ pdf , ps , html , other ]
-
Title: Are LLMs Ready for Real-World Materials Discovery?Subjects: Materials Science (cond-mat.mtrl-sci) ; Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG)
Abstract: Large Language Models (LLMs) create exciting possibilities for powerful language processing tools to accelerate research in materials science. While LLMs have great potential to accelerate materials understanding and discovery, they currently fall short in being practical materials science tools. In this position paper, we show relevant failure cases of LLMs in materials science that reveal current limitations of LLMs related to comprehending and reasoning over complex, interconnected materials science knowledge. Given those shortcomings, we outline a framework for developing Materials Science LLMs (MatSci-LLMs) that are grounded in materials science knowledge and hypothesis generation followed by hypothesis testing. The path to attaining performant MatSci-LLMs rests in large part on building high-quality, multi-modal datasets sourced from scientific literature where various information extraction challenges persist. As such, we describe key materials science information extraction challenges which need to be overcome in order to build large-scale, multi-modal datasets that capture valuable materials science knowledge. Finally, we outline a roadmap for applying future MatSci-LLMs for real-world materials discovery via: 1. Automated Knowledge Base Generation; 2. Automated In-Silico Material Design; and 3. MatSci-LLM Integrated Self-Driving Materials Laboratories.
- [1062] arXiv:2402.05201 (cross-list from cs.CL) [ pdf , ps , html , other ]
-
Title: The Effect of Sampling Temperature on Problem Solving in Large Language ModelsSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI)
Abstract: In this research study, we empirically investigate the effect of sampling temperature on the performance of Large Language Models (LLMs) on various problem-solving tasks. We created a multiple-choice question-and-answer (MCQA) exam by randomly sampling problems from standard LLM benchmarks. Then, we used four popular LLMs with five prompt-engineering techniques to solve the MCQA problems while increasing the sampling temperature from 0.0 to 1.0. Despite anecdotal reports to the contrary, our empirical results indicate that changes in temperature in the range 0.0 to 1.0 do not have a statistically significant impact on LLM performance for problem-solving tasks. In addition, these results appear to hold regardless of the LLM, the prompt-engineering technique, or the problem domain. All code, data, and supplemental materials are available on GitHub at: this https URL .
- [1063] arXiv:2402.05224 (cross-list from cs.CL) [ pdf , ps , html , other ]
-
Title: VerAs: Verify then Assess STEM Lab ReportsComments: It is accepted to AIED2024!Subjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: With an increasing focus in STEM education on critical thinking skills, science writing plays an ever more important role in curricula that stress inquiry skills. A recently published dataset of two sets of college level lab reports from an inquiry-based physics curriculum relies on analytic assessment rubrics that utilize multiple dimensions, specifying subject matter knowledge and general components of good explanations. Each analytic dimension is assessed on a 6-point scale, to provide detailed feedback to students that can help them improve their science writing skills. Manual assessment can be slow, and difficult to calibrate for consistency across all students in large classes. While much work exists on automated assessment of open-ended questions in STEM subjects, there has been far less work on long-form writing such as lab reports. We present an end-to-end neural architecture that has separate verifier and assessment modules, inspired by approaches to Open Domain Question Answering (OpenQA). VerAs first verifies whether a report contains any content relevant to a given rubric dimension, and if so, assesses the relevant sentences. On the lab reports, VerAs outperforms multiple baselines based on OpenQA systems or Automated Essay Scoring (AES). VerAs also performs well on an analytic rubric for middle school physics essays.
- [1064] arXiv:2402.05232 (cross-list from cs.LG) [ pdf , ps , html , other ]
-
Title: Universal Neural FunctionalsSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Abstract: A challenging problem in many modern machine learning tasks is to process weight-space features, i.e., to transform or extract information from the weights and gradients of a neural network. Recent works have developed promising weight-space models that are equivariant to the permutation symmetries of simple feedforward networks. However, they are not applicable to general architectures, since the permutation symmetries of a weight space can be complicated by recurrence or residual connections. This work proposes an algorithm that automatically constructs permutation equivariant models, which we refer to as universal neural functionals (UNFs), for any weight space. Among other applications, we demonstrate how UNFs can be substituted into existing learned optimizer designs, and find promising improvements over prior methods when optimizing small image classifiers and language models. Our results suggest that learned optimizers can benefit from considering the (symmetry) structure of the weight space they optimize. We open-source our library for constructing UNFs at this https URL .
- [1065] arXiv:2402.05252 (cross-list from cs.LG) [ pdf , ps , html , other ]
-
Title: Learning Fair Ranking Policies via Differentiable Optimization of Ordered Weighted AveragesSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Computers and Society (cs.CY)
Abstract: Learning to Rank (LTR) is one of the most widely used machine learning applications. It is a key component in platforms with profound societal impacts, including job search, healthcare information retrieval, and social media content feeds. Conventional LTR models have been shown to produce biases results, stimulating a discourse on how to address the disparities introduced by ranking systems that solely prioritize user relevance. However, while several models of fair learning to rank have been proposed, they suffer from deficiencies either in accuracy or efficiency, thus limiting their applicability to real-world ranking platforms. This paper shows how efficiently-solvable fair ranking models, based on the optimization of Ordered Weighted Average (OWA) functions, can be integrated into the training loop of an LTR model to achieve favorable balances between fairness, user utility, and runtime efficiency. In particular, this paper is the first to show how to backpropagate through constrained optimizations of OWA objectives, enabling their use in integrated prediction and decision models.
- [1066] arXiv:2402.05271 (cross-list from stat.ML) [ pdf , ps , html , other ]
-
Title: Gradient descent induces alignment between weights and the empirical NTK for deep non-linear networksSubjects: Machine Learning (stat.ML) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: Understanding the mechanisms through which neural networks extract statistics from input-label pairs is one of the most important unsolved problems in supervised learning. Prior works have identified that the gram matrices of the weights in trained neural networks of general architectures are proportional to the average gradient outer product of the model, in a statement known as the Neural Feature Ansatz (NFA). However, the reason these quantities become correlated during training is poorly understood. In this work, we explain the emergence of this correlation. We identify that the NFA is equivalent to alignment between the left singular structure of the weight matrices and a significant component of the empirical neural tangent kernels associated with those weights. We establish that the NFA introduced in prior works is driven by a centered NFA that isolates this alignment. We show that the speed of NFA development can be predicted analytically at early training times in terms of simple statistics of the inputs and labels. Finally, we introduce a simple intervention to increase NFA correlation at any given layer, which dramatically improves the quality of features learned.
- [1067] arXiv:2402.05290 (cross-list from cs.LG) [ pdf , ps , other ]
-
Title: Do Transformer World Models Give Better Policy Gradients?Comments: Michel Ma and Pierluca D'Oro contributed equallySubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Abstract: A natural approach for reinforcement learning is to predict future rewards by unrolling a neural network world model, and to backpropagate through the resulting computational graph to learn a policy. However, this method often becomes impractical for long horizons since typical world models induce hard-to-optimize loss landscapes. Transformers are known to efficiently propagate gradients over long horizons: could they be the solution to this problem? Surprisingly, we show that commonly-used transformer world models produce circuitous gradient paths, which can be detrimental to long-range policy gradients. To tackle this challenge, we propose a class of world models called Actions World Models (AWMs), designed to provide more direct routes for gradient propagation. We integrate such AWMs into a policy gradient framework that underscores the relationship between network architectures and the policy gradient updates they inherently represent. We demonstrate that AWMs can generate optimization landscapes that are easier to navigate even when compared to those from the simulator itself. This property allows transformer AWMs to produce better policies than competitive baselines in realistic long-horizon tasks.
- [1068] arXiv:2402.05293 (cross-list from cs.LG) [ pdf , ps , other ]
-
Title: A comparative study on feature selection for a risk prediction model for colorectal cancerN. Cueto-López , M. T. García-Ordás , V. Dávila-Batista , V. Moreno , N. Aragonés , R. Alaiz-RodríguezSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Abstract: Background and objective
Risk prediction models aim at identifying people at higher risk of developing a target disease. Feature selection is particularly important to improve the prediction model performance avoiding overfitting and to identify the leading cancer risk (and protective) factors. Assessing the stability of feature selection/ranking algorithms becomes an important issue when the aim is to analyze the features with more prediction power. Methods
This work is focused on colorectal cancer, assessing several feature ranking algorithms in terms of performance for a set of risk prediction models (Neural Networks, Support Vector Machines (SVM), Logistic Regression, k-Nearest Neighbors and Boosted Trees). Additionally, their robustness is evaluated following a conventional approach with scalar stability metrics and a visual approach proposed in this work to study both similarity among feature ranking techniques as well as their individual stability. A comparative analysis is carried out between the most relevant features found out in this study and features provided by the experts according to the state-of-the-art knowledge. Results
The two best performance results in terms of Area Under the ROC Curve (AUC) are achieved with a SVM classifier using the top-41 features selected by the SVM wrapper approach (AUC=0.693) and Logistic Regression with the top-40 features selected by the Pearson (AUC=0.689). Experiments showed that performing feature selection contributes to classification performance with a 3.9% and 1.9% improvement in AUC for the SVM and Logistic Regression classifier, respectively, with respect to the results using the full feature set. The visual approach proposed in this work allows to see that the Neural Network-based wrapper ranking is the most unstable while the Random Forest is the most stable. - [1069] arXiv:2402.05294 (cross-list from cs.LG) [ pdf , ps , html , other ]
-
Title: Examining Modality Incongruity in Multimodal Federated Learning for Medical Vision and Language-based Disease DetectionComments: 42 pagesSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV)
Abstract: Multimodal Federated Learning (MMFL) utilizes multiple modalities in each client to build a more powerful Federated Learning (FL) model than its unimodal counterpart. However, the impact of missing modality in different clients, also called modality incongruity, has been greatly overlooked. This paper, for the first time, analyses the impact of modality incongruity and reveals its connection with data heterogeneity across participating clients. We particularly inspect whether incongruent MMFL with unimodal and multimodal clients is more beneficial than unimodal FL. Furthermore, we examine three potential routes of addressing this issue. Firstly, we study the effectiveness of various self-attention mechanisms towards incongruity-agnostic information fusion in MMFL. Secondly, we introduce a modality imputation network (MIN) pre-trained in a multimodal client for modality translation in unimodal clients and investigate its potential towards mitigating the missing modality problem. Thirdly, we assess the capability of client-level and server-level regularization techniques towards mitigating modality incongruity effects. Experiments are conducted under several MMFL settings on two publicly available real-world datasets, MIMIC-CXR and Open-I, with Chest X-Ray and radiology reports.
- [1070] arXiv:2402.05295 (cross-list from cs.LG) [ pdf , ps , other ]
-
Title: An information theoretic approach to quantify the stability of feature selection and ranking algorithmsSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Abstract: Feature selection is a key step when dealing with high dimensional data. In particular, these techniques simplify the process of knowledge discovery from the data by selecting the most relevant features out of the noisy, redundant and irrelevant features. A problem that arises in many of these practical applications is that the outcome of the feature selection algorithm is not stable. Thus, small variations in the data may yield very different feature rankings. Assessing the stability of these methods becomes an important issue in the previously mentioned situations. We propose an information theoretic approach based on the Jensen Shannon divergence to quantify this robustness. Unlike other stability measures, this metric is suitable for different algorithm outcomes: full ranked lists, feature subsets as well as the lesser studied partial ranked lists. This generalized metric quantifies the difference among a whole set of lists with the same size, following a probabilistic approach and being able to give more importance to the disagreements that appear at the top of the list. Moreover, it possesses desirable properties including correction for change, upper lower bounds and conditions for a deterministic selection. We illustrate the use of this stability metric with data generated in a fully controlled way and compare it with popular metrics including the Spearmans rank correlation and the Kunchevas index on feature ranking and selection outcomes, respectively. Additionally, experimental validation of the proposed approach is carried out on a real-world problem of food quality assessment showing its potential to quantify stability from different perspectives.
- [1071] arXiv:2402.05296 (cross-list from cs.LG) [ pdf , ps , other ]
-
Title: Classifying spam emails using agglomerative hierarchical clustering and a topic-based approachSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Abstract: Spam emails are unsolicited, annoying and sometimes harmful messages which may contain malware, phishing or hoaxes. Unlike most studies that address the design of efficient anti-spam filters, we approach the spam email problem from a different and novel perspective. Focusing on the needs of cybersecurity units, we follow a topic-based approach for addressing the classification of spam email into multiple categories. We propose SPEMC-15K-E and SPEMC-15K-S, two novel datasets with approximately 15K emails each in English and Spanish, respectively, and we label them using agglomerative hierarchical clustering into 11 classes. We evaluate 16 pipelines, combining four text representation techniques -Term Frequency-Inverse Document Frequency (TF-IDF), Bag of Words, Word2Vec and BERT- and four classifiers: Support Vector Machine, Näive Bayes, Random Forest and Logistic Regression. Experimental results show that the highest performance is achieved with TF-IDF and LR for the English dataset, with a F1 score of 0.953 and an accuracy of 94.6%, and while for the Spanish dataset, TF-IDF with NB yields a F1 score of 0.945 and 98.5% accuracy. Regarding the processing time, TF-IDF with LR leads to the fastest classification, processing an English and Spanish spam email in and on average, respectively.
- [1072] arXiv:2402.05301 (cross-list from cs.CV) [ pdf , ps , html , other ]
-
Title: BIKED++: A Multimodal Dataset of 1.4 Million Bicycle Image and Parametric CAD DesignsSubjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: This paper introduces a public dataset of 1.4 million procedurally-generated bicycle designs represented parametrically, as JSON files, and as rasterized images. The dataset is created through the use of a rendering engine which harnesses the BikeCAD software to generate vector graphics from parametric designs. This rendering engine is discussed in the paper and also released publicly alongside the dataset. Though this dataset has numerous applications, a principal motivation is the need to train cross-modal predictive models between parametric and image-based design representations. For example, we demonstrate that a predictive model can be trained to accurately estimate Contrastive Language-Image Pretraining (CLIP) embeddings from a parametric representation directly. This allows similarity relations to be established between parametric bicycle designs and text strings or reference images. Trained predictive models are also made public. The dataset joins the BIKED dataset family which includes thousands of mixed-representation human-designed bicycle models and several datasets quantifying design performance. The code and dataset can be found at: this https URL
- [1073] arXiv:2402.05306 (cross-list from cs.LG) [ pdf , ps , html , other ]
-
Title: Sym-Q: Adaptive Symbolic Regression via Sequential Decision-MakingSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Abstract: Symbolic regression holds great potential for uncovering underlying mathematical and physical relationships from empirical data. While existing transformer-based models have recently achieved significant success in this domain, they face challenges in terms of generalizability and adaptability. Typically, in cases where the output expressions do not adequately fit experimental data, the models lack efficient mechanisms to adapt or modify the expression. This inflexibility hinders their application in real-world scenarios, particularly in discovering unknown physical or biological relationships. Inspired by how human experts refine and adapt expressions, we introduce Symbolic Q-network (Sym-Q), a novel reinforcement learning-based model that redefines symbolic regression as a sequential decision-making task. Sym-Q leverages supervised demonstrations and refines expressions based on reward signals indicating the quality of fitting precision. Its distinctive ability to manage the complexity of expression trees and perform precise step-wise updates significantly enhances flexibility and efficiency. Our results demonstrate that Sym-Q excels not only in recovering underlying mathematical structures but also uniquely learns to efficiently refine the output expression based on reward signals, thereby discovering underlying expressions. Sym-Q paves the way for more intuitive and impactful discoveries in physical science, marking a substantial advancement in the field of symbolic regression.
- [1074] arXiv:2402.05322 (cross-list from cs.LG) [ pdf , ps , html , other ]
-
Title: Learning on Multimodal Graphs: A SurveyComments: 9 pages, 1 figureSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Graphics (cs.GR); Social and Information Networks (cs.SI)
Abstract: Multimodal data pervades various domains, including healthcare, social media, and transportation, where multimodal graphs play a pivotal role. Machine learning on multimodal graphs, referred to as multimodal graph learning (MGL), is essential for successful artificial intelligence (AI) applications. The burgeoning research in this field encompasses diverse graph data types and modalities, learning techniques, and application scenarios. This survey paper conducts a comparative analysis of existing works in multimodal graph learning, elucidating how multimodal learning is achieved across different graph types and exploring the characteristics of prevalent learning techniques. Additionally, we delineate significant applications of multimodal graph learning and offer insights into future directions in this domain. Consequently, this paper serves as a foundational resource for researchers seeking to comprehend existing MGL techniques and their applicability across diverse scenarios.
- [1075] arXiv:2402.05355 (cross-list from cs.CY) [ pdf , ps , html , other ]
-
Title: A Survey on Safe Multi-Modal Learning SystemSubjects: Computers and Society (cs.CY) ; Artificial Intelligence (cs.AI)
Abstract: In the rapidly evolving landscape of artificial intelligence, multimodal learning systems (MMLS) have gained traction for their ability to process and integrate information from diverse modality inputs. Their expanding use in vital sectors such as healthcare has made safety assurance a critical concern. However, the absence of systematic research into their safety is a significant barrier to progress in this field. To bridge the gap, we present the first taxonomy that systematically categorizes and assesses MMLS safety. This taxonomy is structured around four fundamental pillars that are critical to ensuring the safety of MMLS: robustness, alignment, monitoring, and controllability. Leveraging this taxonomy, we review existing methodologies, benchmarks, and the current state of research, while also pinpointing the principal limitations and gaps in knowledge. Finally, we discuss unique challenges in MMLS safety. In illuminating these challenges, we aim to pave the way for future research, proposing potential directions that could lead to significant advancements in the safety protocols of MMLS.
- [1076] arXiv:2402.05370 (cross-list from cs.LG) [ pdf , ps , html , other ]
-
Title: Attention as Robust Representation for Time Series ForecastingSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Abstract: Time series forecasting is essential for many practical applications, with the adoption of transformer-based models on the rise due to their impressive performance in NLP and CV. Transformers' key feature, the attention mechanism, dynamically fusing embeddings to enhance data representation, often relegating attention weights to a byproduct role. Yet, time series data, characterized by noise and non-stationarity, poses significant forecasting challenges. Our approach elevates attention weights as the primary representation for time series, capitalizing on the temporal relationships among data points to improve forecasting accuracy. Our study shows that an attention map, structured using global landmarks and local windows, acts as a robust kernel representation for data points, withstanding noise and shifts in distribution. Our method outperforms state-of-the-art models, reducing mean squared error (MSE) in multivariate time series forecasting by a notable 3.6% without altering the core neural network architecture. It serves as a versatile component that can readily replace recent patching based embedding schemes in transformer-based models, boosting their performance.
- [1077] arXiv:2402.05374 (cross-list from cs.CV) [ pdf , ps , other ]
-
Title: CIC: A framework for Culturally-aware Image CaptioningComments: Accepted in IJCAI 2024Subjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
Abstract: Image Captioning generates descriptive sentences from images using Vision-Language Pre-trained models (VLPs) such as BLIP, which has improved greatly. However, current methods lack the generation of detailed descriptive captions for the cultural elements depicted in the images, such as the traditional clothing worn by people from Asian cultural groups. In this paper, we propose a new framework, \textbf{Culturally-aware Image Captioning (CIC)}, that generates captions and describes cultural elements extracted from cultural visual elements in images representing cultures. Inspired by methods combining visual modality and Large Language Models (LLMs) through appropriate prompts, our framework (1) generates questions based on cultural categories from images, (2) extracts cultural visual elements from Visual Question Answering (VQA) using generated questions, and (3) generates culturally-aware captions using LLMs with the prompts. Our human evaluation conducted on 45 participants from 4 different cultural groups with a high understanding of the corresponding culture shows that our proposed framework generates more culturally descriptive captions when compared to the image captioning baseline based on VLPs. Our code and dataset will be made publicly available upon acceptance.
- [1078] arXiv:2402.05378 (cross-list from eess.SP) [ pdf , ps , html , other ]
-
Title: Graph Neural Networks for Physical-Layer Security in Multi-User Flexible-Duplex NetworksSubjects: Signal Processing (eess.SP) ; Artificial Intelligence (cs.AI); Cryptography and Security (cs.CR); Machine Learning (cs.LG)
Abstract: This paper explores Physical-Layer Security (PLS) in Flexible Duplex (FlexD) networks, considering scenarios involving eavesdroppers. Our investigation revolves around the intricacies of the sum secrecy rate maximization problem, particularly when faced with coordinated and distributed eavesdroppers employing a Minimum Mean Square Error (MMSE) receiver. Our contributions include an iterative classical optimization solution and an unsupervised learning strategy based on Graph Neural Networks (GNNs). To the best of our knowledge, this work marks the initial exploration of GNNs for PLS applications. Additionally, we extend the GNN approach to address the absence of eavesdroppers' channel knowledge. Extensive numerical simulations highlight FlexD's superiority over Half-Duplex (HD) communications and the GNN approach's superiority over the classical method in both performance and time complexity.
- [1079] arXiv:2402.05396 (cross-list from cs.LG) [ pdf , ps , other ]
-
Title: TASER: Temporal Adaptive Sampling for Fast and Accurate Dynamic Graph Representation LearningGangda Deng , Hongkuan Zhou , Hanqing Zeng , Yinglong Xia , Christopher Leung , Jianbo Li , Rajgopal Kannan , Viktor PrasannaComments: IPDPS 2024Subjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Abstract: Recently, Temporal Graph Neural Networks (TGNNs) have demonstrated state-of-the-art performance in various high-impact applications, including fraud detection and content recommendation. Despite the success of TGNNs, they are prone to the prevalent noise found in real-world dynamic graphs like time-deprecated links and skewed interaction distribution. The noise causes two critical issues that significantly compromise the accuracy of TGNNs: (1) models are supervised by inferior interactions, and (2) noisy input induces high variance in the aggregated messages. However, current TGNN denoising techniques do not consider the diverse and dynamic noise pattern of each node. In addition, they also suffer from the excessive mini-batch generation overheads caused by traversing more neighbors. We believe the remedy for fast and accurate TGNNs lies in temporal adaptive sampling. In this work, we propose TASER, the first adaptive sampling method for TGNNs optimized for accuracy, efficiency, and scalability. TASER adapts its mini-batch selection based on training dynamics and temporal neighbor selection based on the contextual, structural, and temporal properties of past interactions. To alleviate the bottleneck in mini-batch generation, TASER implements a pure GPU-based temporal neighbor finder and a dedicated GPU feature cache. We evaluate the performance of TASER using two state-of-the-art backbone TGNNs. On five popular datasets, TASER outperforms the corresponding baselines by an average of 2.3% in Mean Reciprocal Rank (MRR) while achieving an average of 5.1x speedup in training time.
- [1080] arXiv:2402.05399 (cross-list from cs.RO) [ pdf , ps , other ]
-
Title: CURE: Simulation-Augmented Auto-Tuning in RoboticsComments: Submitted in IEEE Transactions on Robotics (T-RO), 2024Subjects: Robotics (cs.RO) ; Artificial Intelligence (cs.AI)
Abstract: Robotic systems are typically composed of various subsystems, such as localization and navigation, each encompassing numerous configurable components (e.g., selecting different planning algorithms). Once an algorithm has been selected for a component, its associated configuration options must be set to the appropriate values. Configuration options across the system stack interact non-trivially. Finding optimal configurations for highly configurable robots to achieve desired performance poses a significant challenge due to the interactions between configuration options across software and hardware that result in an exponentially large and complex configuration space. These challenges are further compounded by the need for transferability between different environments and robotic platforms. Data efficient optimization algorithms (e.g., Bayesian optimization) have been increasingly employed to automate the tuning of configurable parameters in cyber-physical systems. However, such optimization algorithms converge at later stages, often after exhausting the allocated budget (e.g., optimization steps, allotted time) and lacking transferability. This paper proposes CURE -- a method that identifies causally relevant configuration options, enabling the optimization process to operate in a reduced search space, thereby enabling faster optimization of robot performance. CURE abstracts the causal relationships between various configuration options and robot performance objectives by learning a causal model in the source (a low-cost environment such as the Gazebo simulator) and applying the learned knowledge to perform optimization in the target (e.g., Turtlebot 3 physical robot). We demonstrate the effectiveness and transferability of CURE by conducting experiments that involve varying degrees of deployment changes in both physical robots and simulation.
- [1081] arXiv:2402.05403 (cross-list from cs.CL) [ pdf , ps , html , other ]
-
Title: In-Context Principle Learning from MistakesTianjun Zhang , Aman Madaan , Luyu Gao , Steven Zheng , Swaroop Mishra , Yiming Yang , Niket Tandon , Uri AlonSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI)
Abstract: In-context learning (ICL, also known as few-shot prompting) has been the standard method of adapting LLMs to downstream tasks, by learning from a few input-output examples. Nonetheless, all ICL-based approaches only learn from correct input-output pairs. In this paper, we revisit this paradigm, by learning more from the few given input-output examples. We introduce Learning Principles (LEAP): First, we intentionally induce the model to make mistakes on these few examples; then we reflect on these mistakes, and learn explicit task-specific "principles" from them, which help solve similar problems and avoid common mistakes; finally, we prompt the model to answer unseen test questions using the original few-shot examples and these learned general principles. We evaluate LEAP on a wide range of benchmarks, including multi-hop question answering (Hotpot QA), textual QA (DROP), Big-Bench Hard reasoning, and math problems (GSM8K and MATH); in all these benchmarks, LEAP improves the strongest available LLMs such as GPT-3.5-turbo, GPT-4, GPT-4 turbo and Claude-2.1. For example, LEAP improves over the standard few-shot prompting using GPT-4 by 7.5% in DROP, and by 3.3% in HotpotQA. Importantly, LEAP does not require any more input or examples than the standard few-shot prompting settings.
- [1082] arXiv:2402.05421 (cross-list from cs.LG) [ pdf , ps , other ]
-
Title: DiffTOP: Differentiable Trajectory Optimization for Deep Reinforcement and Imitation LearningSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Robotics (cs.RO)
Abstract: This paper introduces DiffTOP, which utilizes Differentiable Trajectory OPtimization as the policy representation to generate actions for deep reinforcement and imitation learning. Trajectory optimization is a powerful and widely used algorithm in control, parameterized by a cost and a dynamics function. The key to our approach is to leverage the recent progress in differentiable trajectory optimization, which enables computing the gradients of the loss with respect to the parameters of trajectory optimization. As a result, the cost and dynamics functions of trajectory optimization can be learned end-to-end. DiffTOP addresses the ``objective mismatch'' issue of prior model-based RL algorithms, as the dynamics model in DiffTOP is learned to directly maximize task performance by differentiating the policy gradient loss through the trajectory optimization process. We further benchmark DiffTOP for imitation learning on standard robotic manipulation task suites with high-dimensional sensory observations and compare our method to feed-forward policy classes as well as Energy-Based Models (EBM) and Diffusion. Across 15 model-based RL tasks and 13 imitation learning tasks with high-dimensional image and point cloud inputs, DiffTOP outperforms prior state-of-the-art methods in both domains.
- [1083] arXiv:2402.05428 (cross-list from cs.LG) [ pdf , ps , other ]
-
Title: Mixture Density Networks for Classification with an Application to Product BundlingSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Abstract: While mixture density networks (MDNs) have been extensively used for regression tasks, they have not been used much for classification tasks. One reason for this is that the usability of MDNs for classification is not clear and straightforward. In this paper, we propose two MDN-based models for classification tasks. Both models fit mixtures of Gaussians to the the data and use the fitted distributions to classify a given sample by evaluating the learnt cumulative distribution function for the given input features. While the proposed MDN-based models perform slightly better than, or on par with, five baseline classification models on three publicly available datasets, the real utility of our models comes out through a real-world product bundling application. Specifically, we use our MDN-based models to learn the willingness-to-pay (WTP) distributions for two products from synthetic sales data of the individual products. The Gaussian mixture representation of the learnt WTP distributions is then exploited to obtain the WTP distribution of the bundle consisting of both the products. The proposed MDN-based models are able to approximate the true WTP distributions of both products and the bundle well.
- [1084] arXiv:2402.05435 (cross-list from cs.CL) [ pdf , ps , other ]
-
Title: GPT-4 Generated Narratives of Life Events using a Structured Narrative Prompt: A Validation StudyChristopher J. Lynch , Erik Jensen , Madison H. Munro , Virginia Zamponi , Joseph Martinez , Kevin O'Brien , Brandon Feldhaus , Katherine Smith , Ann Marie Reinhold , Ross GoreComments: 29 pages, 24 figuresSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: Large Language Models (LLMs) play a pivotal role in generating vast arrays of narratives, facilitating a systematic exploration of their effectiveness for communicating life events in narrative form. In this study, we employ a zero-shot structured narrative prompt to generate 24,000 narratives using OpenAI's GPT-4. From this dataset, we manually classify 2,880 narratives and evaluate their validity in conveying birth, death, hiring, and firing events. Remarkably, 87.43% of the narratives sufficiently convey the intention of the structured prompt. To automate the identification of valid and invalid narratives, we train and validate nine Machine Learning models on the classified datasets. Leveraging these models, we extend our analysis to predict the classifications of the remaining 21,120 narratives. All the ML models excelled at classifying valid narratives as valid, but experienced challenges at simultaneously classifying invalid narratives as invalid. Our findings not only advance the study of LLM capabilities, limitations, and validity but also offer practical insights for narrative generation and natural language processing applications.
- [1085] arXiv:2402.05448 (cross-list from cs.CV) [ pdf , ps , html , other ]
-
Title: Minecraft-ify: Minecraft Style Image Generation with Text-guided Image Editing for In-Game ApplicationComments: 2 pages, 2 figures. Accepted as Spotlight to NeurIPS 2023 Workshop on Machine Learning for Creativity and DesignSubjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI); Graphics (cs.GR); Machine Learning (cs.LG); Multimedia (cs.MM)
Abstract: In this paper, we first present the character texture generation system \textit{Minecraft-ify}, specified to Minecraft video game toward in-game application. Ours can generate face-focused image for texture mapping tailored to 3D virtual character having cube manifold. While existing projects or works only generate texture, proposed system can inverse the user-provided real image, or generate average/random appearance from learned distribution. Moreover, it can be manipulated with text-guidance using StyleGAN and StyleCLIP. These features provide a more extended user experience with enlarged freedom as a user-friendly AI-tool. Project page can be found at this https URL
- [1086] arXiv:2402.05457 (cross-list from cs.CL) [ pdf , ps , other ]
-
Title: It's Never Too Late: Fusing Acoustic Information into Large Language Models for Automatic Speech RecognitionChen Chen , Ruizhe Li , Yuchen Hu , Sabato Marco Siniscalchi , Pin-Yu Chen , Ensiong Chng , Chao-Han Huck YangComments: Accepted to ICLR 2024, 17 pages. This work will be open sourced under MIT licenseSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI); Multimedia (cs.MM); Sound (cs.SD); Audio and Speech Processing (eess.AS)
Abstract: Recent studies have successfully shown that large language models (LLMs) can be successfully used for generative error correction (GER) on top of the automatic speech recognition (ASR) output. Specifically, an LLM is utilized to carry out a direct mapping from the N-best hypotheses list generated by an ASR system to the predicted output transcription. However, despite its effectiveness, GER introduces extra data uncertainty since the LLM is trained without taking into account acoustic information available in the speech signal. In this work, we aim to overcome such a limitation by infusing acoustic information before generating the predicted transcription through a novel late fusion solution termed Uncertainty-Aware Dynamic Fusion (UADF). UADF is a multimodal fusion approach implemented into an auto-regressive decoding process and works in two stages: (i) It first analyzes and calibrates the token-level LLM decision, and (ii) it then dynamically assimilates the information from the acoustic modality. Experimental evidence collected from various ASR tasks shows that UADF surpasses existing fusion mechanisms in several ways. It yields significant improvements in word error rate (WER) while mitigating data uncertainty issues in LLM and addressing the poor generalization relied with sole modality during fusion. We also demonstrate that UADF seamlessly adapts to audio-visual speech recognition.
- [1087] arXiv:2402.05484 (cross-list from cs.SE) [ pdf , ps , other ]
-
Title: Leveraging AI for Enhanced Software Effort Estimation: A Comprehensive Study and Framework ProposalSubjects: Software Engineering (cs.SE) ; Artificial Intelligence (cs.AI)
Abstract: This paper presents an extensive study on the application of AI techniques for software effort estimation in the past five years from 2017 to 2023. By overcoming the limitations of traditional methods, the study aims to improve accuracy and reliability. Through performance evaluation and comparison with diverse Machine Learning models, including Artificial Neural Network (ANN), Support Vector Machine (SVM), Linear Regression, Random Forest and other techniques, the most effective method is identified. The proposed AI-based framework holds the potential to enhance project planning and resource allocation, contributing to the research area of software project effort estimation.
- [1088] arXiv:2402.05493 (cross-list from cs.SE) [ pdf , ps , html , other ]
-
Title: Investigating White-Box Attacks for On-Device ModelsComments: Published in The International Conference on Software Engineering 2024 (ICSE'24)Subjects: Software Engineering (cs.SE) ; Artificial Intelligence (cs.AI); Cryptography and Security (cs.CR)
Abstract: Numerous mobile apps have leveraged deep learning capabilities. However, on-device models are vulnerable to attacks as they can be easily extracted from their corresponding mobile apps. Existing on-device attacking approaches only generate black-box attacks, which are far less effective and efficient than white-box strategies. This is because mobile deep learning frameworks like TFLite do not support gradient computing, which is necessary for white-box attacking algorithms. Thus, we argue that existing findings may underestimate the harmfulness of on-device attacks. To this end, we conduct a study to answer this research question: Can on-device models be directly attacked via white-box strategies? We first systematically analyze the difficulties of transforming the on-device model to its debuggable version, and propose a Reverse Engineering framework for On-device Models (REOM), which automatically reverses the compiled on-device TFLite model to the debuggable model. Specifically, REOM first transforms compiled on-device models into Open Neural Network Exchange format, then removes the non-debuggable parts, and converts them to the debuggable DL models format that allows attackers to exploit in a white-box setting. Our experimental results show that our approach is effective in achieving automated transformation among 244 TFLite models. Compared with previous attacks using surrogate models, REOM enables attackers to achieve higher attack success rates with a hundred times smaller attack perturbations. In addition, because the ONNX platform has plenty of tools for model format exchanging, the proposed method based on the ONNX platform can be adapted to other model formats. Our findings emphasize the need for developers to carefully consider their model deployment strategies, and use white-box methods to evaluate the vulnerability of on-device models.
- [1089] arXiv:2402.05512 (cross-list from cs.CL) [ pdf , ps , other ]
-
Title: GPTs Are Multilingual Annotators for Sequence Generation TasksComments: EACL 2024 Findings: Camera-ready versionSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI)
Abstract: Data annotation is an essential step for constructing new datasets. However, the conventional approach of data annotation through crowdsourcing is both time-consuming and expensive. In addition, the complexity of this process increases when dealing with low-resource languages owing to the difference in the language pool of crowdworkers. To address these issues, this study proposes an autonomous annotation method by utilizing large language models, which have been recently demonstrated to exhibit remarkable performance. Through our experiments, we demonstrate that the proposed method is not just cost-efficient but also applicable for low-resource language annotation. Additionally, we constructed an image captioning dataset using our approach and are committed to open this dataset for future study. We have opened our source code for further study and reproducibility.
- [1090] arXiv:2402.05515 (cross-list from cs.CL) [ pdf , ps , other ]
-
Title: NoisyICL: A Little Noise in Model Parameters Calibrates In-context LearningComments: 20 pages, 28 figures, 7 tables (5 pages, 4 figures, 1 table in main body). ACL 2024 under reviewSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI)
Abstract: In-Context Learning (ICL) is suffering from unsatisfactory performance and under-calibration due to high prior bias and unfaithful confidence. Some previous works fine-tuned language models for better ICL performance with enormous datasets and computing costs. In this paper, we propose NoisyICL, simply perturbing the model parameters by random noises to strive for better performance and calibration. Our experiments on two models and 12 downstream datasets show that NoisyICL can help ICL produce more accurate predictions. Our further analysis indicates that NoisyICL enables the model to provide more fair predictions, and also with more faithful confidence. Therefore, we believe that NoisyICL is an effective calibration of ICL. Our experimental code is uploaded to Github.
- [1091] arXiv:2402.05519 (cross-list from cs.DL) [ pdf , ps , other ]
-
Title: Can ChatGPT evaluate research quality?Journal-ref: Journal of Data and Information Science, 9(2), 1-21Subjects: Digital Libraries (cs.DL) ; Artificial Intelligence (cs.AI)
Abstract: Purpose: Assess whether ChatGPT 4.0 is accurate enough to perform research evaluations on journal articles to automate this time-consuming task. Design/methodology/approach: Test the extent to which ChatGPT-4 can assess the quality of journal articles using a case study of the published scoring guidelines of the UK Research Excellence Framework (REF) 2021 to create a research evaluation ChatGPT. This was applied to 51 of my own articles and compared against my own quality judgements. Findings: ChatGPT-4 can produce plausible document summaries and quality evaluation rationales that match the REF criteria. Its overall scores have weak correlations with my self-evaluation scores of the same documents (averaging r=0.281 over 15 iterations, with 8 being statistically significantly different from 0). In contrast, the average scores from the 15 iterations produced a statistically significant positive correlation of 0.509. Thus, averaging scores from multiple ChatGPT-4 rounds seems more effective than individual scores. The positive correlation may be due to ChatGPT being able to extract the author's significance, rigour, and originality claims from inside each paper. If my weakest articles are removed, then the correlation with average scores (r=0.200) falls below statistical significance, suggesting that ChatGPT struggles to make fine-grained evaluations. Research limitations: The data is self-evaluations of a convenience sample of articles from one academic in one field. Practical implications: Overall, ChatGPT does not yet seem to be accurate enough to be trusted for any formal or informal research quality evaluation tasks. Research evaluators, including journal editors, should therefore take steps to control its use. Originality/value: This is the first published attempt at post-publication expert review accuracy testing for ChatGPT.
- [1092] arXiv:2402.05521 (cross-list from cs.LG) [ pdf , ps , other ]
-
Title: Linearizing Models for Efficient yet Robust Private InferenceSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Cryptography and Security (cs.CR)
Abstract: The growing concern about data privacy has led to the development of private inference (PI) frameworks in client-server applications which protects both data privacy and model IP. However, the cryptographic primitives required yield significant latency overhead which limits its wide-spread application. At the same time, changing environments demand the PI service to be robust against various naturally occurring and gradient-based perturbations. Despite several works focused on the development of latency-efficient models suitable for PI, the impact of these models on robustness has remained unexplored. Towards this goal, this paper presents RLNet, a class of robust linearized networks that can yield latency improvement via reduction of high-latency ReLU operations while improving the model performance on both clean and corrupted images. In particular, RLNet models provide a "triple win ticket" of improved classification accuracy on clean, naturally perturbed, and gradient-based perturbed images using a shared-mask shared-weight architecture with over an order of magnitude fewer ReLUs than baseline models. To demonstrate the efficacy of RLNet, we perform extensive experiments with ResNet and WRN model variants on CIFAR-10, CIFAR-100, and Tiny-ImageNet datasets. Our experimental evaluations show that RLNet can yield models with up to 11.14x fewer ReLUs, with accuracy close to the all-ReLU models, on clean, naturally perturbed, and gradient-based perturbed images. Compared with the SoTA non-robust linearized models at similar ReLU budgets, RLNet achieves an improvement in adversarial accuracy of up to ~47%, naturally perturbed accuracy up to ~16.4%, while improving clean image accuracy up to ~1.5%.
- [1093] arXiv:2402.05525 (cross-list from cs.LG) [ pdf , ps , other ]
-
Title: Differentially Private Model-Based Offline Reinforcement LearningSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Cryptography and Security (cs.CR); Machine Learning (stat.ML)
Abstract: We address offline reinforcement learning with privacy guarantees, where the goal is to train a policy that is differentially private with respect to individual trajectories in the dataset. To achieve this, we introduce DP-MORL, an MBRL algorithm coming with differential privacy guarantees. A private model of the environment is first learned from offline data using DP-FedAvg, a training method for neural networks that provides differential privacy guarantees at the trajectory level. Then, we use model-based policy optimization to derive a policy from the (penalized) private model, without any further interaction with the system or access to the input data. We empirically show that DP-MORL enables the training of private RL agents from offline data and we furthermore outline the price of privacy in this setting.
- [1094] arXiv:2402.05541 (cross-list from cs.LG) [ pdf , ps , other ]
-
Title: Reinforcement Learning as a Catalyst for Robust and Fair Federated Learning: Deciphering the Dynamics of Client ContributionsSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Distributed, Parallel, and Cluster Computing (cs.DC)
Abstract: Recent advancements in federated learning (FL) have produced models that retain user privacy by training across multiple decentralized devices or systems holding local data samples. However, these strategies often neglect the inherent challenges of statistical heterogeneity and vulnerability to adversarial attacks, which can degrade model robustness and fairness. Personalized FL strategies offer some respite by adjusting models to fit individual client profiles, yet they tend to neglect server-side aggregation vulnerabilities. To address these issues, we propose Reinforcement Federated Learning (RFL), a novel framework that leverages deep reinforcement learning to adaptively optimize client contribution during aggregation, thereby enhancing both model robustness against malicious clients and fairness across participants under non-identically distributed settings. To achieve this goal, we propose a meticulous approach involving a Deep Deterministic Policy Gradient-based algorithm for continuous control of aggregation weights, an innovative client selection method based on model parameter distances, and a reward mechanism guided by validation set performance. Empirically, extensive experiments demonstrate that, in terms of robustness, RFL outperforms the state-of-the-art methods, while maintaining comparable levels of fairness, offering a promising solution to build resilient and fair federated systems.
- [1095] arXiv:2402.05546 (cross-list from cs.LG) [ pdf , ps , other ]
-
Title: Offline Actor-Critic Reinforcement Learning Scales to Large ModelsJost Tobias Springenberg , Abbas Abdolmaleki , Jingwei Zhang , Oliver Groth , Michael Bloesch , Thomas Lampe , Philemon Brakel , Sarah Bechtle , Steven Kapturowski , Roland Hafner , Nicolas Heess , Martin RiedmillerSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Robotics (cs.RO)
Abstract: We show that offline actor-critic reinforcement learning can scale to large models - such as transformers - and follows similar scaling laws as supervised learning. We find that offline actor-critic algorithms can outperform strong, supervised, behavioral cloning baselines for multi-task training on a large dataset containing both sub-optimal and expert behavior on 132 continuous control tasks. We introduce a Perceiver-based actor-critic model and elucidate the key model features needed to make offline RL work with self- and cross-attention modules. Overall, we find that: i) simple offline actor critic algorithms are a natural choice for gradually moving away from the currently predominant paradigm of behavioral cloning, and ii) via offline RL it is possible to learn multi-task policies that master many domains simultaneously, including real robotics tasks, from sub-optimal demonstrations or self-generated data.
- [1096] arXiv:2402.05547 (cross-list from cs.CL) [ pdf , ps , other ]
-
Title: Benchmarking Large Language Models on Communicative Medical Coaching: a Novel System and DatasetComments: NASubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI)
Abstract: Traditional applications of natural language processing (NLP) in healthcare have predominantly focused on patient-centered services, enhancing patient interactions and care delivery, such as through medical dialogue systems. However, the potential of NLP to benefit inexperienced doctors, particularly in areas such as communicative medical coaching, remains largely unexplored. We introduce ``ChatCoach,'' an integrated human-AI cooperative framework. Within this framework, both a patient agent and a coaching agent collaboratively support medical learners in practicing their medical communication skills during consultations. Unlike traditional dialogue systems, ChatCoach provides a simulated environment where a human doctor can engage in medical dialogue with a patient agent. Simultaneously, a coaching agent provides real-time feedback to the doctor. To construct the ChatCoach system, we developed a dataset and integrated Large Language Models such as ChatGPT and Llama2, aiming to assess their effectiveness in communicative medical coaching tasks. Our comparative analysis demonstrates that instruction-tuned Llama2 significantly outperforms ChatGPT's prompting-based approaches.
- [1097] arXiv:2402.05558 (cross-list from cs.LG) [ pdf , ps , other ]
-
Title: Flashback: Understanding and Mitigating Forgetting in Federated LearningSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Distributed, Parallel, and Cluster Computing (cs.DC)
Abstract: In Federated Learning (FL), forgetting, or the loss of knowledge across rounds, hampers algorithm convergence, particularly in the presence of severe data heterogeneity among clients. This study explores the nuances of this issue, emphasizing the critical role of forgetting in FL's inefficient learning within heterogeneous data contexts. Knowledge loss occurs in both client-local updates and server-side aggregation steps; addressing one without the other fails to mitigate forgetting. We introduce a metric to measure forgetting granularly, ensuring distinct recognition amid new knowledge acquisition. Leveraging these insights, we propose Flashback, an FL algorithm with a dynamic distillation approach that is used to regularize the local models, and effectively aggregate their knowledge. Across different benchmarks, Flashback outperforms other methods, mitigates forgetting, and achieves faster round-to-target-accuracy, by converging in 6 to 16 rounds.
- [1098] arXiv:2402.05563 (cross-list from math.NA) [ pdf , ps , other ]
-
Title: Neural Multigrid ArchitecturesSubjects: Numerical Analysis (math.NA) ; Artificial Intelligence (cs.AI)
Abstract: We propose a convenient matrix-free neural architecture for the multigrid method. The architecture is simple enough to be implemented in less than fifty lines of code, yet it encompasses a large number of distinct multigrid solvers. We argue that a fixed neural network without dense layers can not realize an efficient iterative method. Because of that, standard training protocols do not lead to competitive solvers. To overcome this difficulty, we use parameter sharing and serialization of layers. The resulting network can be trained on linear problems with thousands of unknowns and retains its efficiency on problems with millions of unknowns. From the point of view of numerical linear algebra network's training corresponds to finding optimal smoothers for the geometric multigrid method. We demonstrate our approach on a few second-order elliptic equations. For tested linear systems, we obtain from two to five times smaller spectral radius of the error propagation matrix compare to a basic linear multigrid with Jacobi smoother.
- [1099] arXiv:2402.05569 (cross-list from cs.LG) [ pdf , ps , other ]
-
Title: Hypergraph Node Classification With Graph Neural NetworksSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Signal Processing (eess.SP); Machine Learning (stat.ML)
Abstract: Hypergraphs, with hyperedges connecting more than two nodes, are key for modelling higher-order interactions in real-world data. The success of graph neural networks (GNNs) reveals the capability of neural networks to process data with pairwise interactions. This inspires the usage of neural networks for data with higher-order interactions, thereby leading to the development of hypergraph neural networks (HyperGNNs). GNNs and HyperGNNs are typically considered distinct since they are designed for data on different geometric topologies. However, in this paper, we theoretically demonstrate that, in the context of node classification, most HyperGNNs can be approximated using a GNN with a weighted clique expansion of the hypergraph. This leads to WCE-GNN, a simple and efficient framework comprising a GNN and a weighted clique expansion (WCE), for hypergraph node classification. Experiments on nine real-world hypergraph node classification benchmarks showcase that WCE-GNN demonstrates not only higher classification accuracy compared to state-of-the-art HyperGNNs, but also superior memory and runtime efficiency.
- [1100] arXiv:2402.05575 (cross-list from cs.LG) [ pdf , ps , other ]
-
Title: Simultaneously Achieving Group Exposure Fairness and Within-Group Meritocracy in Stochastic BanditsComments: Accepted in AAMAS 2024Subjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Computers and Society (cs.CY); Multiagent Systems (cs.MA)
Abstract: Existing approaches to fairness in stochastic multi-armed bandits (MAB) primarily focus on exposure guarantee to individual arms. When arms are naturally grouped by certain attribute(s), we propose Bi-Level Fairness, which considers two levels of fairness. At the first level, Bi-Level Fairness guarantees a certain minimum exposure to each group. To address the unbalanced allocation of pulls to individual arms within a group, we consider meritocratic fairness at the second level, which ensures that each arm is pulled according to its merit within the group. Our work shows that we can adapt a UCB-based algorithm to achieve a Bi-Level Fairness by providing (i) anytime Group Exposure Fairness guarantees and (ii) ensuring individual-level Meritocratic Fairness within each group. We first show that one can decompose regret bounds into two components: (a) regret due to anytime group exposure fairness and (b) regret due to meritocratic fairness within each group. Our proposed algorithm BF-UCB balances these two regrets optimally to achieve the upper bound of $O(\sqrt{T})$ on regret; $T$ being the stopping time. With the help of simulated experiments, we further show that BF-UCB achieves sub-linear regret; provides better group and individual exposure guarantees compared to existing algorithms; and does not result in a significant drop in reward with respect to UCB algorithm, which does not impose any fairness constraint.
- [1101] arXiv:2402.05584 (cross-list from cs.CL) [ pdf , ps , other ]
-
Title: AutoAugment Is What You Need: Enhancing Rule-based Augmentation Methods in Low-resource RegimesComments: EACL 2024 Student Research WorkshopSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI)
Abstract: Text data augmentation is a complex problem due to the discrete nature of sentences. Although rule-based augmentation methods are widely adopted in real-world applications because of their simplicity, they suffer from potential semantic damage. Previous researchers have suggested easy data augmentation with soft labels (softEDA), employing label smoothing to mitigate this problem. However, finding the best factor for each model and dataset is challenging; therefore, using softEDA in real-world applications is still difficult. In this paper, we propose adapting AutoAugment to solve this problem. The experimental results suggest that the proposed method can boost existing augmentation methods and that rule-based methods can enhance cutting-edge pre-trained language models. We offer the source code.
- [1102] arXiv:2402.05591 (cross-list from cs.CL) [ pdf , ps , other ]
-
Title: SoftEDA: Rethinking Rule-Based Data Augmentation with Soft LabelsComments: ICLR 2023 Tiny PapersSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI)
Abstract: Rule-based text data augmentation is widely used for NLP tasks due to its simplicity. However, this method can potentially damage the original meaning of the text, ultimately hurting the performance of the model. To overcome this limitation, we propose a straightforward technique for applying soft labels to augmented data. We conducted experiments across seven different classification tasks and empirically demonstrated the effectiveness of our proposed approach. We have publicly opened our source code for reproducibility.
- [1103] arXiv:2402.05593 (cross-list from cs.CV) [ pdf , ps , other ]
-
Title: A Concept for Reconstructing Stucco Statues from historic Sketches using synthetic Data onlyJournal-ref: Eurographics Workshop on Graphics and Cultural Heritage 2022Subjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI)
Abstract: In medieval times, stuccoworkers used a red color, called sinopia, to first create a sketch of the to-be-made statue on the wall. Today, many of these statues are destroyed, but using the original drawings, deriving from the red color also called sinopia, we can reconstruct how the final statue might have looked.We propose a fully-automated approach to reconstruct a point cloud and show preliminary results by generating a color-image, a depth-map, as well as surface normals requiring only a single sketch, and without requiring a collection of other, similar samples. Our proposed solution allows real-time reconstruction on-site, for instance, within an exhibition, or to generate a useful starting point for an expert, trying to manually reconstruct the statue, all while using only synthetic data for training.
- [1104] arXiv:2402.05602 (cross-list from cs.CL) [ pdf , ps , html , other ]
-
Title: AttnLRP: Attention-Aware Layer-wise Relevance Propagation for TransformersReduan Achtibat , Sayed Mohammad Vakilzadeh Hatefi , Maximilian Dreyer , Aakriti Jain , Thomas Wiegand , Sebastian Lapuschkin , Wojciech SamekSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Abstract: Large Language Models are prone to biased predictions and hallucinations, underlining the paramount importance of understanding their model-internal reasoning process. However, achieving faithful attributions for the entirety of a black-box transformer model and maintaining computational efficiency is an unsolved challenge. By extending the Layer-wise Relevance Propagation attribution method to handle attention layers, we address these challenges effectively. While partial solutions exist, our method is the first to faithfully and holistically attribute not only input but also latent representations of transformer models with the computational efficiency similar to a singular backward pass. Through extensive evaluations against existing methods on Llama 2, Flan-T5 and the Vision Transformer architecture, we demonstrate that our proposed approach surpasses alternative methods in terms of faithfulness and enables the understanding of latent representations, opening up the door for concept-based explanations. We provide an open-source implementation on GitHub this https URL .
- [1105] arXiv:2402.05610 (cross-list from cs.CV) [ pdf , ps , html , other ]
-
Title: Extending 6D Object Pose Estimators for Stereo VisionSubjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI)
Abstract: Estimating the 6D pose of objects accurately, quickly, and robustly remains a difficult task. However, recent methods for directly regressing poses from RGB images using dense features have achieved state-of-the-art results. Stereo vision, which provides an additional perspective on the object, can help reduce pose ambiguity and occlusion. Moreover, stereo can directly infer the distance of an object, while mono-vision requires internalized knowledge of the object's size. To extend the state-of-the-art in 6D object pose estimation to stereo, we created a BOP compatible stereo version of the YCB-V dataset. Our method outperforms state-of-the-art 6D pose estimation algorithms by utilizing stereo vision and can easily be adopted for other dense feature-based algorithms.
- [1106] arXiv:2402.05616 (cross-list from cs.CL) [ pdf , ps , other ]
-
Title: Pretrained Generative Language Models as General Learning Frameworks for Sequence-Based TasksSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: We propose that small pretrained foundational generative language models with millions of parameters can be utilized as a general learning framework for sequence-based tasks. Our proposal overcomes the computational resource, skill set, and timeline challenges associated with training neural networks and language models from scratch. Further, our approach focuses on creating small and highly specialized models that can accurately execute a challenging task of which the base model is incapable of performing. We demonstrate that 125M, 350M, and 1.3B parameter pretrained foundational language models can be instruction fine-tuned with 10,000-to-1,000,000 instruction examples to achieve near state-of-the-art results on challenging cheminformatics tasks. We also demonstrate the role of successive language model fine-tuning epochs on improved outcomes, as well as the importance of both data formatting and pretrained foundational language model selection for instruction fine-tuning success.
- [1107] arXiv:2402.05624 (cross-list from cs.CL) [ pdf , ps , other ]
-
Title: Efficient Models for the Detection of Hate, Abuse and ProfanityComments: 8 pages, 7 figuresSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI); Human-Computer Interaction (cs.HC)
Abstract: Large Language Models (LLMs) are the cornerstone for many Natural Language Processing (NLP) tasks like sentiment analysis, document classification, named entity recognition, question answering, summarization, etc. LLMs are often trained on data which originates from the web. This data is prone to having content with Hate, Abuse and Profanity (HAP). For a detailed definition of HAP, please refer to the Appendix. Due to the LLMs being exposed to HAP content during training, the models learn it and may then generate hateful or profane content. For example, when the open-source RoBERTa model (specifically, the RoBERTA base model) from the HuggingFace (HF) Transformers library is prompted to replace the mask token in `I do not know that Persian people are that MASK` it returns the word `stupid` with the highest score. This is unacceptable in civil discourse.The detection of Hate, Abuse and Profanity in text is a vital component of creating civil and unbiased LLMs, which is needed not only for English, but for all languages. In this article, we briefly describe the creation of HAP detectors and various ways of using them to make models civil and acceptable in the output they generate.
- [1108] arXiv:2402.05627 (cross-list from cs.LG) [ pdf , ps , other ]
-
Title: Binding Dynamics in Rotating FeaturesSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Neurons and Cognition (q-bio.NC)
Abstract: In human cognition, the binding problem describes the open question of how the brain flexibly integrates diverse information into cohesive object representations. Analogously, in machine learning, there is a pursuit for models capable of strong generalization and reasoning by learning object-centric representations in an unsupervised manner. Drawing from neuroscientific theories, Rotating Features learn such representations by introducing vector-valued features that encapsulate object characteristics in their magnitudes and object affiliation in their orientations. The "$\chi$-binding" mechanism, embedded in every layer of the architecture, has been shown to be crucial, but remains poorly understood. In this paper, we propose an alternative "cosine binding" mechanism, which explicitly computes the alignment between features and adjusts weights accordingly, and we show that it achieves equivalent performance. This allows us to draw direct connections to self-attention and biological neural processes, and to shed light on the fundamental dynamics for object-centric representations to emerge in Rotating Features.
- [1109] arXiv:2402.05636 (cross-list from cs.SE) [ pdf , ps , other ]
-
Title: The Impact of AI Tool on Engineering at ANZ Bank An Empirical Study on GitHub Copilot within Corporate EnvironmentComments: 16 pages, 4 figures. in proceeding for 10th International Conference on Software Engineering (SEC 2024)Subjects: Software Engineering (cs.SE) ; Artificial Intelligence (cs.AI)
Abstract: The increasing popularity of AI, particularly Large Language Models (LLMs), has significantly impacted various domains, including Software Engineering. This study explores the integration of AI tools in software engineering practices within a large organization. We focus on ANZ Bank, which employs over 5000 engineers covering all aspects of the software development life cycle. This paper details an experiment conducted using GitHub Copilot, a notable AI tool, within a controlled environment to evaluate its effectiveness in real-world engineering tasks. Additionally, this paper shares initial findings on the productivity improvements observed after GitHub Copilot was adopted on a large scale, with about 1000 engineers using it. ANZ Bank's six-week experiment with GitHub Copilot included two weeks of preparation and four weeks of active testing. The study evaluated participant sentiment and the tool's impact on productivity, code quality, and security. Initially, participants used GitHub Copilot for proposed use-cases, with their feedback gathered through regular surveys. In the second phase, they were divided into Control and Copilot groups, each tackling the same Python challenges, and their experiences were again surveyed. Results showed a notable boost in productivity and code quality with GitHub Copilot, though its impact on code security remained inconclusive. Participant responses were overall positive, confirming GitHub Copilot's effectiveness in large-scale software engineering environments. Early data from 1000 engineers also indicated a significant increase in productivity and job satisfaction.
- [1110] arXiv:2402.05643 (cross-list from cs.LG) [ pdf , ps , other ]
-
Title: Improving Token-Based World Models with Parallel Observation PredictionSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Abstract: Motivated by the success of Transformers when applied to sequences of discrete symbols, token-based world models (TBWMs) were recently proposed as sample-efficient methods. In TBWMs, the world model consumes agent experience as a language-like sequence of tokens, where each observation constitutes a sub-sequence. However, during imagination, the sequential token-by-token generation of next observations results in a severe bottleneck, leading to long training times, poor GPU utilization, and limited representations. To resolve this bottleneck, we devise a novel Parallel Observation Prediction (POP) mechanism. POP augments a Retentive Network (RetNet) with a novel forward mode tailored to our reinforcement learning setting. We incorporate POP in a novel TBWM agent named REM (Retentive Environment Model), showcasing a 15.4x faster imagination compared to prior TBWMs. REM attains superhuman performance on 12 out of 26 games of the Atari 100K benchmark, while training in less than 12 hours. Our code is available at \url{ this https URL }.
- [1111] arXiv:2402.05650 (cross-list from cs.SE) [ pdf , ps , html , other ]
-
Title: Rocks Coding, Not Development--A Human-Centric, Experimental Evaluation of LLM-Supported SE TasksComments: The paper has been accepted by FSESubjects: Software Engineering (cs.SE) ; Artificial Intelligence (cs.AI)
Abstract: Recently, large language models (LLM) based generative AI has been gaining momentum for their impressive high-quality performances in multiple domains, particularly after the release of the ChatGPT. Many believe that they have the potential to perform general-purpose problem-solving in software development and replace human software developers. Nevertheless, there are in a lack of serious investigation into the capability of these LLM techniques in fulfilling software development tasks. In a controlled 2 x 2 between-subject experiment with 109 participants, we examined whether and to what degree working with ChatGPT was helpful in the coding task and typical software development task and how people work with ChatGPT. We found that while ChatGPT performed well in solving simple coding problems, its performance in supporting typical software development tasks was not that good. We also observed the interactions between participants and ChatGPT and found the relations between the interactions and the outcomes. Our study thus provides first-hand insights into using ChatGPT to fulfill software engineering tasks with real-world developers and motivates the need for novel interaction mechanisms that help developers effectively work with large language models to achieve desired outcomes.
- [1112] arXiv:2402.05660 (cross-list from cs.LG) [ pdf , ps , other ]
-
Title: Rethinking Propagation for Unsupervised Graph Domain AdaptationComments: Accepted by AAAI-24Subjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Abstract: Unsupervised Graph Domain Adaptation (UGDA) aims to transfer knowledge from a labelled source graph to an unlabelled target graph in order to address the distribution shifts between graph domains. Previous works have primarily focused on aligning data from the source and target graph in the representation space learned by graph neural networks (GNNs). However, the inherent generalization capability of GNNs has been largely overlooked. Motivated by our empirical analysis, we reevaluate the role of GNNs in graph domain adaptation and uncover the pivotal role of the propagation process in GNNs for adapting to different graph domains. We provide a comprehensive theoretical analysis of UGDA and derive a generalization bound for multi-layer GNNs. By formulating GNN Lipschitz for k-layer GNNs, we show that the target risk bound can be tighter by removing propagation layers in source graph and stacking multiple propagation layers in target graph. Based on the empirical and theoretical analysis mentioned above, we propose a simple yet effective approach called A2GNN for graph domain adaptation. Through extensive experiments on real-world datasets, we demonstrate the effectiveness of our proposed A2GNN framework.
- [1113] arXiv:2402.05663 (cross-list from cs.LG) [ pdf , ps , html , other ]
-
Title: Mesoscale Traffic Forecasting for Real-Time Bottleneck and Shockwave PredictionRaphael Chekroun , Han Wang , Jonathan Lee , Marin Toromanoff , Sascha Hornauer , Fabien Moutarde , Maria Laura Delle MonacheSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Robotics (cs.RO)
Abstract: Accurate real-time traffic state forecasting plays a pivotal role in traffic control research. In particular, the CIRCLES consortium project necessitates predictive techniques to mitigate the impact of data source delays. After the success of the MegaVanderTest experiment, this paper aims at overcoming the current system limitations and develop a more suited approach to improve the real-time traffic state estimation for the next iterations of the experiment. In this paper, we introduce the SA-LSTM, a deep forecasting method integrating Self-Attention (SA) on the spatial dimension with Long Short-Term Memory (LSTM) yielding state-of-the-art results in real-time mesoscale traffic forecasting. We extend this approach to multi-step forecasting with the n-step SA-LSTM, which outperforms traditional multi-step forecasting methods in the trade-off between short-term and long-term predictions, all while operating in real-time.
- [1114] arXiv:2402.05668 (cross-list from cs.CR) [ pdf , ps , other ]
-
Title: Comprehensive Assessment of Jailbreak Attacks Against LLMsComments: 18 pages, 12 figuresSubjects: Cryptography and Security (cs.CR) ; Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG)
Abstract: Misuse of the Large Language Models (LLMs) has raised widespread concern. To address this issue, safeguards have been taken to ensure that LLMs align with social ethics. However, recent findings have revealed an unsettling vulnerability bypassing the safeguards of LLMs, known as jailbreak attacks. By applying techniques, such as employing role-playing scenarios, adversarial examples, or subtle subversion of safety objectives as a prompt, LLMs can produce an inappropriate or even harmful response. While researchers have studied several categories of jailbreak attacks, they have done so in isolation. To fill this gap, we present the first large-scale measurement of various jailbreak attack methods. We concentrate on 13 cutting-edge jailbreak methods from four categories, 160 questions from 16 violation categories, and six popular LLMs. Our extensive experimental results demonstrate that the optimized jailbreak prompts consistently achieve the highest attack success rates, as well as exhibit robustness across different LLMs. Some jailbreak prompt datasets, available from the Internet, can also achieve high attack success rates on many LLMs, such as ChatGLM3, GPT-3.5, and PaLM2. Despite the claims from many organizations regarding the coverage of violation categories in their policies, the attack success rates from these categories remain high, indicating the challenges of effectively aligning LLM policies and the ability to counter jailbreak attacks. We also discuss the trade-off between the attack performance and efficiency, as well as show that the transferability of the jailbreak prompts is still viable, becoming an option for black-box models. Overall, our research highlights the necessity of evaluating different jailbreak methods. We hope our study can provide insights for future research on jailbreak attacks and serve as a benchmark tool for evaluating them for practitioners.
- [1115] arXiv:2402.05680 (cross-list from cs.LG) [ pdf , ps , other ]
-
Title: Interpretable classifiers for tabular data via discretization and feature selectionSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Logic in Computer Science (cs.LO)
Abstract: We introduce a method for computing immediately human interpretable yet accurate classifiers from tabular data. The classifiers obtained are short DNF-formulas, computed via first discretizing the original data to Boolean form and then using feature selection coupled with a very fast algorithm for producing the best possible Boolean classifier for the setting. We demonstrate the approach via 14 experiments, obtaining results with accuracies mainly similar to ones obtained via random forests, XGBoost, and existing results for the same datasets in the literature. In several cases, our approach in fact outperforms the reference results in relation to accuracy, even though the main objective of our study is the immediate interpretability of our classifiers. We also prove a new result on the probability that the classifier we obtain from real-life data corresponds to the ideally best classifier with respect to the background distribution the data comes from.
- [1116] arXiv:2402.05699 (cross-list from cs.CL) [ pdf , ps , html , other ]
-
Title: Self-Alignment of Large Language Models via Monopolylogue-based Social Scene SimulationComments: 36 pages, 9 figuresSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI); Computers and Society (cs.CY)
Abstract: Aligning large language models (LLMs) with human values is imperative to mitigate potential adverse effects resulting from their misuse. Drawing from the sociological insight that acknowledging all parties' concerns is a key factor in shaping human values, this paper proposes a novel direction to align LLMs by themselves: social scene simulation. To achieve this, we present MATRIX, a novel social scene simulator that emulates realistic scenes around a user's input query, enabling the LLM to take social consequences into account before responding. MATRIX serves as a virtual rehearsal space, akin to a Monopolylogue, where the LLM performs diverse roles related to the query and practice by itself. To inject this alignment, we fine-tune the LLM with MATRIX-simulated data, ensuring adherence to human values without compromising inference speed. We theoretically show that the LLM with MATRIX outperforms Constitutional AI under mild assumptions. Finally, extensive experiments validate that our method outperforms over 10 baselines across 4 benchmarks. As evidenced by 875 user ratings, our tuned 13B-size LLM exceeds GPT-4 in aligning with human values. Our project page is available at this https URL .
- [1117] arXiv:2402.05703 (cross-list from cs.MA) [ pdf , ps , other ]
-
Title: Offline Risk-sensitive RL with Partial Observability to Enhance Performance in Human-Robot TeamingGiorgio Angelotti , Caroline P. C. Chanel , Adam H. M. Pinto , Christophe Lounis , Corentin Chauffaut , Nicolas DrougardComments: Accepted as a full paper at AAMAS 2024Subjects: Multiagent Systems (cs.MA) ; Artificial Intelligence (cs.AI); Human-Computer Interaction (cs.HC); Machine Learning (cs.LG); Robotics (cs.RO)
Abstract: The integration of physiological computing into mixed-initiative human-robot interaction systems offers valuable advantages in autonomous task allocation by incorporating real-time features as human state observations into the decision-making system. This approach may alleviate the cognitive load on human operators by intelligently allocating mission tasks between agents. Nevertheless, accommodating a diverse pool of human participants with varying physiological and behavioral measurements presents a substantial challenge. To address this, resorting to a probabilistic framework becomes necessary, given the inherent uncertainty and partial observability on the human's state. Recent research suggests to learn a Partially Observable Markov Decision Process (POMDP) model from a data set of previously collected experiences that can be solved using Offline Reinforcement Learning (ORL) methods. In the present work, we not only highlight the potential of partially observable representations and physiological measurements to improve human operator state estimation and performance, but also enhance the overall mission effectiveness of a human-robot team. Importantly, as the fixed data set may not contain enough information to fully represent complex stochastic processes, we propose a method to incorporate model uncertainty, thus enabling risk-sensitive sequential decision-making. Experiments were conducted with a group of twenty-six human participants within a simulated robot teleoperation environment, yielding empirical evidence of the method's efficacy. The obtained adaptive task allocation policy led to statistically significant higher scores than the one that was used to collect the data set, allowing for generalization across diverse participants also taking into account risk-sensitive metrics.
- [1118] arXiv:2402.05712 (cross-list from cs.CV) [ pdf , ps , html , other ]
-
Title: DiffSpeaker: Speech-Driven 3D Facial Animation with Diffusion TransformerComments: 9 pages, 5 figures. Code is avalable at this https URLSubjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI)
Abstract: Speech-driven 3D facial animation is important for many multimedia applications. Recent work has shown promise in using either Diffusion models or Transformer architectures for this task. However, their mere aggregation does not lead to improved performance. We suspect this is due to a shortage of paired audio-4D data, which is crucial for the Transformer to effectively perform as a denoiser within the Diffusion framework. To tackle this issue, we present DiffSpeaker, a Transformer-based network equipped with novel biased conditional attention modules. These modules serve as substitutes for the traditional self/cross-attention in standard Transformers, incorporating thoughtfully designed biases that steer the attention mechanisms to concentrate on both the relevant task-specific and diffusion-related conditions. We also explore the trade-off between accurate lip synchronization and non-verbal facial expressions within the Diffusion paradigm. Experiments show our model not only achieves state-of-the-art performance on existing benchmarks, but also fast inference speed owing to its ability to generate facial motions in parallel.
- [1119] arXiv:2402.05713 (cross-list from cs.LG) [ pdf , ps , html , other ]
-
Title: Hidden in Plain Sight: Undetectable Adversarial Bias Attacks on Vulnerable Patient PopulationsComments: 29 pages, 4 figuresSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
Abstract: The proliferation of artificial intelligence (AI) in radiology has shed light on the risk of deep learning (DL) models exacerbating clinical biases towards vulnerable patient populations. While prior literature has focused on quantifying biases exhibited by trained DL models, demographically targeted adversarial bias attacks on DL models and its implication in the clinical environment remains an underexplored field of research in medical imaging. In this work, we demonstrate that demographically targeted label poisoning attacks can introduce undetectable underdiagnosis bias in DL models. Our results across multiple performance metrics and demographic groups like sex, age, and their intersectional subgroups show that adversarial bias attacks demonstrate high-selectivity for bias in the targeted group by degrading group model performance without impacting overall model performance. Furthermore, our results indicate that adversarial bias attacks result in biased DL models that propagate prediction bias even when evaluated with external datasets.
- [1120] arXiv:2402.05724 (cross-list from cs.LG) [ pdf , ps , other ]
-
Title: Model-Based RL for Mean-Field Games is not Statistically Harder than Single-Agent RLComments: 49 PagesSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Computer Science and Game Theory (cs.GT); Machine Learning (stat.ML)
Abstract: We study the sample complexity of reinforcement learning (RL) in Mean-Field Games (MFGs) with model-based function approximation that requires strategic exploration to find a Nash Equilibrium policy. We introduce the Partial Model-Based Eluder Dimension (P-MBED), a more effective notion to characterize the model class complexity. Notably, P-MBED measures the complexity of the single-agent model class converted from the given mean-field model class, and potentially, can be exponentially lower than the MBED proposed by \citet{huang2023statistical}. We contribute a model elimination algorithm featuring a novel exploration strategy and establish sample complexity results polynomial w.r.t.~P-MBED. Crucially, our results reveal that, under the basic realizability and Lipschitz continuity assumptions, \emph{learning Nash Equilibrium in MFGs is no more statistically challenging than solving a logarithmic number of single-agent RL problems}. We further extend our results to Multi-Type MFGs, generalizing from conventional MFGs and involving multiple types of agents. This extension implies statistical tractability of a broader class of Markov Games through the efficacy of mean-field approximation. Finally, inspired by our theoretical algorithm, we present a heuristic approach with improved computational efficiency and empirically demonstrate its effectiveness.
- [1121] arXiv:2402.05741 (cross-list from cs.RO) [ pdf , ps , other ]
-
Title: Real-World Robot Applications of Foundation Models: A ReviewKento Kawaharazuka , Tatsuya Matsushima , Andrew Gambardella , Jiaxian Guo , Chris Paxton , Andy ZengSubjects: Robotics (cs.RO) ; Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Abstract: Recent developments in foundation models, like Large Language Models (LLMs) and Vision-Language Models (VLMs), trained on extensive data, facilitate flexible application across different tasks and modalities. Their impact spans various fields, including healthcare, education, and robotics. This paper provides an overview of the practical application of foundation models in real-world robotics, with a primary emphasis on the replacement of specific components within existing robot systems. The summary encompasses the perspective of input-output relationships in foundation models, as well as their role in perception, motion planning, and control within the field of robotics. This paper concludes with a discussion of future challenges and implications for practical robot applications.
- [1122] arXiv:2402.05747 (cross-list from cs.CV) [ pdf , ps , other ]
-
Title: Jacquard V2: Refining Datasets using the Human In the Loop Data Correction MethodSubjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI)
Abstract: In the context of rapid advancements in industrial automation, vision-based robotic grasping plays an increasingly crucial role. In order to enhance visual recognition accuracy, the utilization of large-scale datasets is imperative for training models to acquire implicit knowledge related to the handling of various objects. Creating datasets from scratch is a time and labor-intensive process. Moreover, existing datasets often contain errors due to automated annotations aimed at expediency, making the improvement of these datasets a substantial research challenge. Consequently, several issues have been identified in the annotation of grasp bounding boxes within the popular Jacquard Grasp. We propose utilizing a Human-In-The-Loop(HIL) method to enhance dataset quality. This approach relies on backbone deep learning networks to predict object positions and orientations for robotic grasping. Predictions with Intersection over Union (IOU) values below 0.2 undergo an assessment by human operators. After their evaluation, the data is categorized into False Negatives(FN) and True Negatives(TN). FN are then subcategorized into either missing annotations or catastrophic labeling errors. Images lacking labels are augmented with valid grasp bounding box information, whereas images afflicted by catastrophic labeling errors are completely removed. The open-source tool Labelbee was employed for 53,026 iterations of HIL dataset enhancement, leading to the removal of 2,884 images and the incorporation of ground truth information for 30,292 images. The enhanced dataset, named the Jacquard V2 Grasping Dataset, served as the training data for a range of neural networks.
- [1123] arXiv:2402.05749 (cross-list from cs.LG) [ pdf , ps , other ]
-
Title: Generalized Preference Optimization: A Unified Approach to Offline AlignmentYunhao Tang , Zhaohan Daniel Guo , Zeyu Zheng , Daniele Calandriello , Rémi Munos , Mark Rowland , Pierre Harvey Richemond , Michal Valko , Bernardo Ávila Pires , Bilal PiotSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Abstract: Offline preference optimization allows fine-tuning large models directly from offline data, and has proved effective in recent alignment practices. We propose generalized preference optimization (GPO), a family of offline losses parameterized by a general class of convex functions. GPO enables a unified view over preference optimization, encompassing existing algorithms such as DPO, IPO and SLiC as special cases, while naturally introducing new variants. The GPO framework also sheds light on how offline algorithms enforce regularization, through the design of the convex function that defines the loss. Our analysis and experiments reveal the connections and subtle differences between the offline regularization and the KL divergence regularization intended by the canonical RLHF formulation. In all, our results present new algorithmic toolkits and empirical insights to alignment practitioners.
- [1124] arXiv:2402.05774 (cross-list from cs.LG) [ pdf , ps , html , other ]
-
Title: Stable Autonomous Flow MatchingComments: In submissionSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Systems and Control (eess.SY)
Abstract: In contexts where data samples represent a physically stable state, it is often assumed that the data points represent the local minima of an energy landscape. In control theory, it is well-known that energy can serve as an effective Lyapunov function. Despite this, connections between control theory and generative models in the literature are sparse, even though there are several machine learning applications with physically stable data points. In this paper, we focus on such data and a recent class of deep generative models called flow matching. We apply tools of stochastic stability for time-independent systems to flow matching models. In doing so, we characterize the space of flow matching models that are amenable to this treatment, as well as draw connections to other control theory principles. We demonstrate our theoretical results on two examples.
- [1125] arXiv:2402.05782 (cross-list from cs.LG) [ pdf , ps , other ]
-
Title: Analysing the Sample Complexity of Opponent ShapingJournal-ref: AAMAS 2024Subjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Computer Science and Game Theory (cs.GT); Multiagent Systems (cs.MA)
Abstract: Learning in general-sum games often yields collectively sub-optimal results. Addressing this, opponent shaping (OS) methods actively guide the learning processes of other agents, empirically leading to improved individual and group performances in many settings. Early OS methods use higher-order derivatives to shape the learning of co-players, making them unsuitable for shaping multiple learning steps. Follow-up work, Model-free Opponent Shaping (M-FOS), addresses these by reframing the OS problem as a meta-game. In contrast to early OS methods, there is little theoretical understanding of the M-FOS framework. Providing theoretical guarantees for M-FOS is hard because A) there is little literature on theoretical sample complexity bounds for meta-reinforcement learning B) M-FOS operates in continuous state and action spaces, so theoretical analysis is challenging. In this work, we present R-FOS, a tabular version of M-FOS that is more suitable for theoretical analysis. R-FOS discretises the continuous meta-game MDP into a tabular MDP. Within this discretised MDP, we adapt the $R_{max}$ algorithm, most prominently used to derive PAC-bounds for MDPs, as the meta-learner in the R-FOS algorithm. We derive a sample complexity bound that is exponential in the cardinality of the inner state and action space and the number of agents. Our bound guarantees that, with high probability, the final policy learned by an R-FOS agent is close to the optimal policy, apart from a constant factor. Finally, we investigate how R-FOS's sample complexity scales in the size of state-action space. Our theoretical results on scaling are supported empirically in the Matching Pennies environment.
- [1126] arXiv:2402.05785 (cross-list from cs.LG) [ pdf , ps , other ]
-
Title: Limits of Transformer Language Models on Learning Algorithmic CompositionsJonathan Thomm , Aleksandar Terzic , Geethan Karunaratne , Giacomo Camposampiero , Bernhard Schölkopf , Abbas RahimiSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
Abstract: We analyze the capabilities of Transformer language models on learning discrete algorithms. To this end, we introduce two new tasks demanding the composition of several discrete sub-tasks. On both training LLaMA models from scratch and prompting on GPT-4 and Gemini we measure learning compositions of learned primitives. We observe that the compositional capabilities of state-of-the-art Transformer language models are very limited and sample-wise scale worse than relearning all sub-tasks for a new algorithmic composition. We also present a theorem in complexity theory, showing that gradient descent on memorizing feedforward models can be exponentially data inefficient.
- [1127] arXiv:2402.05794 (cross-list from cs.CL) [ pdf , ps , other ]
-
Title: Phonetically rich corpus construction for a low-resourced languageSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI)
Abstract: Speech technologies rely on capturing a speaker's voice variability while obtaining comprehensive language information. Textual prompts and sentence selection methods have been proposed in the literature to comprise such adequate phonetic data, referred to as a phonetically rich \textit{corpus}. However, they are still insufficient for acoustic modeling, especially critical for languages with limited resources. Hence, this paper proposes a novel approach and outlines the methodological aspects required to create a \textit{corpus} with broad phonetic coverage for a low-resourced language, Brazilian Portuguese. Our methodology includes text dataset collection up to a sentence selection algorithm based on triphone distribution. Furthermore, we propose a new phonemic classification according to acoustic-articulatory speech features since the absolute number of distinct triphones, or low-probability triphones, does not guarantee an adequate representation of every possible combination. Using our algorithm, we achieve a 55.8\% higher percentage of distinct triphones -- for samples of similar size -- while the currently available phonetic-rich corpus, CETUC and TTS-Portuguese, 12.6\% and 12.3\% in comparison to a non-phonetically rich dataset.
- [1128] arXiv:2402.05804 (cross-list from cs.CV) [ pdf , ps , html , other ]
-
Title: InkSight: Offline-to-Online Handwriting Conversion by Learning to Read and WriteBlagoj Mitrevski , Arina Rak , Julian Schnitzler , Chengkun Li , Andrii Maksai , Jesse Berent , Claudiu MusatSubjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI)
Abstract: Digital note-taking is gaining popularity, offering a durable, editable, and easily indexable way of storing notes in the vectorized form, known as digital ink. However, a substantial gap remains between this way of note-taking and traditional pen-and-paper note-taking, a practice still favored by a vast majority. Our work, InkSight, aims to bridge the gap by empowering physical note-takers to effortlessly convert their work (offline handwriting) to digital ink (online handwriting), a process we refer to as Derendering. Prior research on the topic has focused on the geometric properties of images, resulting in limited generalization beyond their training domains. Our approach combines reading and writing priors, allowing training a model in the absence of large amounts of paired samples, which are difficult to obtain. To our knowledge, this is the first work that effectively derenders handwritten text in arbitrary photos with diverse visual characteristics and backgrounds. Furthermore, it generalizes beyond its training domain into simple sketches. Our human evaluation reveals that 87% of the samples produced by our model on the challenging HierText dataset are considered as a valid tracing of the input image and 67% look like a pen trajectory traced by a human. Interactive visualizations of 100 word-level model outputs for each of the three public datasets are available in our Hugging Face space: this https URL . Model release is in progress.
- [1129] arXiv:2402.05809 (cross-list from cs.CV) [ pdf , ps , html , other ]
-
Title: You Only Need One Color Space: An Efficient Network for Low-light Image EnhancementSubjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI)
Abstract: Low-Light Image Enhancement (LLIE) task tends to restore the details and visual information from corrupted low-light images. Most existing methods learn the mapping function between low/normal-light images by Deep Neural Networks (DNNs) on sRGB and HSV color space. Nevertheless, enhancement involves amplifying image signals, and applying these color spaces to low-light images with a low signal-to-noise ratio can introduce sensitivity and instability into the enhancement process. Consequently, this results in the presence of color artifacts and brightness artifacts in the enhanced images. To alleviate this problem, we propose a novel trainable color space, named Horizontal/Vertical-Intensity (HVI). It not only decouples brightness and color from RGB channels to mitigate the instability during enhancement but also adapts to low-light images in different illumination ranges due to the trainable parameters. Further, we design a novel Color and Intensity Decoupling Network (CIDNet) with two branches dedicated to processing the decoupled image brightness and color in the HVI space. Within CIDNet, we introduce the Lightweight Cross-Attention (LCA) module to facilitate interaction between image structure and content information in both branches, while also suppressing noise in low-light images. Finally, we conducted 22 quantitative and qualitative experiments to show that the proposed CIDNet outperforms the state-of-the-art methods on 11 datasets. The code is available at this https URL .
- [1130] arXiv:2402.05813 (cross-list from cs.CL) [ pdf , ps , other ]
-
Title: Selective Forgetting: Advancing Machine Unlearning Techniques and Evaluation in Language ModelsSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI)
Abstract: The aim of this study is to investigate Machine Unlearning (MU), a burgeoning field focused on addressing concerns related to neural models inadvertently retaining personal or sensitive data. Here, a novel approach is introduced to achieve precise and selective forgetting within language models. Unlike previous methodologies that adopt completely opposing training objectives, this approach aims to mitigate adverse effects on language model performance, particularly in generation tasks. Furthermore, two innovative evaluation metrics are proposed: Sensitive Information Extraction Likelihood (S-EL) and Sensitive Information Memory Accuracy (S-MA), designed to gauge the effectiveness of sensitive information elimination. To reinforce the forgetting framework, an effective method for annotating sensitive scopes is presented, involving both online and offline strategies. The online selection mechanism leverages language probability scores to ensure computational efficiency, while the offline annotation entails a robust two-stage process based on Large Language Models (LLMs).
- [1131] arXiv:2402.05823 (cross-list from cs.LG) [ pdf , ps , other ]
-
Title: FusionSF: Fuse Heterogeneous Modalities in a Vector Quantized Framework for Robust Solar Power ForecastingSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
Abstract: Accurate solar power forecasting is crucial to integrate photovoltaic plants into the electric grid, schedule and secure the power grid safety. This problem becomes more demanding for those newly installed solar plants which lack sufficient data. Current research predominantly relies on historical solar power data or numerical weather prediction in a single-modality format, ignoring the complementary information provided in different modalities. In this paper, we propose a multi-modality fusion framework to integrate historical power data, numerical weather prediction, and satellite images, significantly improving forecast performance. We introduce a vector quantized framework that aligns modalities with varying information densities, striking a balance between integrating sufficient information and averting model overfitting. Our framework demonstrates strong zero-shot forecasting capability, which is especially useful for those newly installed plants. Moreover, we collect and release a multi-modal solar power (MMSP) dataset from real-world plants to further promote the research of multi-modal solar forecasting algorithms. Our extensive experiments show that our model not only operates with robustness but also boosts accuracy in both zero-shot forecasting and scenarios rich with training data, surpassing leading models. We have incorporated it into our eForecaster platform and deployed it for more than 300 solar plants with a capacity of over 15GW.
- [1132] arXiv:2402.05828 (cross-list from cs.LG) [ pdf , ps , other ]
-
Title: Discovering Temporally-Aware Reinforcement Learning AlgorithmsMatthew Thomas Jackson , Chris Lu , Louis Kirsch , Robert Tjarko Lange , Shimon Whiteson , Jakob Nicolaus FoersterComments: Published at ICLR 2024Subjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Abstract: Recent advancements in meta-learning have enabled the automatic discovery of novel reinforcement learning algorithms parameterized by surrogate objective functions. To improve upon manually designed algorithms, the parameterization of this learned objective function must be expressive enough to represent novel principles of learning (instead of merely recovering already established ones) while still generalizing to a wide range of settings outside of its meta-training distribution. However, existing methods focus on discovering objective functions that, like many widely used objective functions in reinforcement learning, do not take into account the total number of steps allowed for training, or "training horizon". In contrast, humans use a plethora of different learning objectives across the course of acquiring a new ability. For instance, students may alter their studying techniques based on the proximity to exam deadlines and their self-assessed capabilities. This paper contends that ignoring the optimization time horizon significantly restricts the expressive potential of discovered learning algorithms. We propose a simple augmentation to two existing objective discovery approaches that allows the discovered algorithm to dynamically update its objective function throughout the agent's training procedure, resulting in expressive schedules and increased generalization across different training horizons. In the process, we find that commonly used meta-gradient approaches fail to discover such adaptive objective functions while evolution strategies discover highly dynamic learning rules. We demonstrate the effectiveness of our approach on a wide range of tasks and analyze the resulting learned algorithms, which we find effectively balance exploration and exploitation by modifying the structure of their learning rules throughout the agent's lifetime.
- [1133] arXiv:2402.05830 (cross-list from cs.LG) [ pdf , ps , other ]
-
Title: Sparse-VQ Transformer: An FFN-Free Framework with Vector Quantization for Enhanced Time Series ForecastingSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Abstract: Time series analysis is vital for numerous applications, and transformers have become increasingly prominent in this domain. Leading methods customize the transformer architecture from NLP and CV, utilizing a patching technique to convert continuous signals into segments. Yet, time series data are uniquely challenging due to significant distribution shifts and intrinsic noise levels. To address these two challenges,we introduce the Sparse Vector Quantized FFN-Free Transformer (Sparse-VQ). Our methodology capitalizes on a sparse vector quantization technique coupled with Reverse Instance Normalization (RevIN) to reduce noise impact and capture sufficient statistics for forecasting, serving as an alternative to the Feed-Forward layer (FFN) in the transformer architecture. Our FFN-free approach trims the parameter count, enhancing computational efficiency and reducing overfitting. Through evaluations across ten benchmark datasets, including the newly introduced CAISO dataset, Sparse-VQ surpasses leading models with a 7.84% and 4.17% decrease in MAE for univariate and multivariate time series forecasting, respectively. Moreover, it can be seamlessly integrated with existing transformer-based models to elevate their performance.
- [1134] arXiv:2402.05862 (cross-list from cs.LG) [ pdf , ps , html , other ]
-
Title: Let Your Graph Do the Talking: Encoding Structured Data for LLMsBryan Perozzi , Bahare Fatemi , Dustin Zelle , Anton Tsitsulin , Mehran Kazemi , Rami Al-Rfou , Jonathan HalcrowSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Social and Information Networks (cs.SI); Machine Learning (stat.ML)
Abstract: How can we best encode structured data into sequential form for use in large language models (LLMs)? In this work, we introduce a parameter-efficient method to explicitly represent structured data for LLMs. Our method, GraphToken, learns an encoding function to extend prompts with explicit structured information. Unlike other work which focuses on limited domains (e.g. knowledge graph representation), our work is the first effort focused on the general encoding of structured data to be used for various reasoning tasks. We show that explicitly representing the graph structure allows significant improvements to graph reasoning tasks. Specifically, we see across the board improvements - up to 73% points - on node, edge and, graph-level tasks from the GraphQA benchmark.
- [1135] arXiv:2402.05868 (cross-list from cs.CL) [ pdf , ps , other ]
-
Title: EmojiCrypt: Prompt Encryption for Secure Communication with Large Language ModelsComments: 12 pages, 4 figures, 2 tables, comments and suggestions are welcomeSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI); Cryptography and Security (cs.CR); Information Retrieval (cs.IR); Machine Learning (cs.LG)
Abstract: Cloud-based large language models (LLMs) such as ChatGPT have increasingly become integral to daily operations, serving as vital tools across various applications. While these models offer substantial benefits in terms of accessibility and functionality, they also introduce significant privacy concerns: the transmission and storage of user data in cloud infrastructures pose substantial risks of data breaches and unauthorized access to sensitive information; even if the transmission and storage of data is encrypted, the LLM service provider itself still knows the real contents of the data, preventing individuals or entities from confidently using such LLM services. To address these concerns, this paper proposes a simple yet effective mechanism EmojiCrypt to protect user privacy. It uses Emoji to encrypt the user inputs before sending them to LLM, effectively rendering them indecipherable to human or LLM's examination while retaining the original intent of the prompt, thus ensuring the model's performance remains unaffected. We conduct experiments on three tasks, personalized recommendation, sentiment analysis, and tabular data analysis. Experiment results reveal that EmojiCrypt can encrypt personal information within prompts in such a manner that not only prevents the discernment of sensitive data by humans or LLM itself, but also maintains or even improves the precision without further tuning, achieving comparable or even better task accuracy than directly prompting the LLM without prompt encryption. These results highlight the practicality of adopting encryption measures that safeguard user privacy without compromising the functional integrity and performance of LLMs. Code and dataset are available at this https URL .
- [1136] arXiv:2402.05880 (cross-list from cs.CL) [ pdf , ps , html , other ]
-
Title: Generative Echo Chamber? Effects of LLM-Powered Search Systems on Diverse Information SeekingComments: Accepted in CHI'24. Supplementary material will be available online with the official submission in CHI 2024Subjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI); Human-Computer Interaction (cs.HC)
Abstract: Large language models (LLMs) powered conversational search systems have already been used by hundreds of millions of people, and are believed to bring many benefits over conventional search. However, while decades of research and public discourse interrogated the risk of search systems in increasing selective exposure and creating echo chambers -- limiting exposure to diverse opinions and leading to opinion polarization, little is known about such a risk of LLM-powered conversational search. We conduct two experiments to investigate: 1) whether and how LLM-powered conversational search increases selective exposure compared to conventional search; 2) whether and how LLMs with opinion biases that either reinforce or challenge the user's view change the effect. Overall, we found that participants engaged in more biased information querying with LLM-powered conversational search, and an opinionated LLM reinforcing their views exacerbated this bias. These results present critical implications for the development of LLMs and conversational search systems, and the policy governing these technologies.
- [1137] arXiv:2402.05889 (cross-list from cs.CV) [ pdf , ps , other ]
-
Title: CREMA: Multimodal Compositional Video Reasoning via Efficient Modular Adaptation and FusionComments: project page: this https URLSubjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
Abstract: Despite impressive advancements in multimodal compositional reasoning approaches, they are still limited in their flexibility and efficiency by processing fixed modality inputs while updating a lot of model parameters. This paper tackles these critical challenges and proposes CREMA, an efficient and modular modality-fusion framework for injecting any new modality into video reasoning. We first augment multiple informative modalities (such as optical flow, 3D point cloud, audio) from given videos without extra human annotation by leveraging existing pre-trained models. Next, we introduce a query transformer with multiple parameter-efficient modules associated with each accessible modality. It projects diverse modality features to the LLM token embedding space, allowing the model to integrate different data types for response generation. Furthermore, we propose a fusion module designed to compress multimodal queries, maintaining computational efficiency in the LLM while combining additional modalities. We validate our method on video-3D, video-audio, and video-language reasoning tasks and achieve better/equivalent performance against strong multimodal LLMs, including BLIP-2, 3D-LLM, and SeViLA while using 96% fewer trainable parameters. We provide extensive analyses of CREMA, including the impact of each modality on reasoning domains, the design of the fusion module, and example visualizations.
- [1138] arXiv:2402.05902 (cross-list from cs.CV) [ pdf , ps , other ]
-
Title: ClickSAM: Fine-tuning Segment Anything Model using click prompts for ultrasound image segmentationComments: 5 pages, 2 figures, SPIE Medical Imaging Conference 2024. Project page: this https URLSubjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI); Medical Physics (physics.med-ph)
Abstract: The newly released Segment Anything Model (SAM) is a popular tool used in image processing due to its superior segmentation accuracy, variety of input prompts, training capabilities, and efficient model design. However, its current model is trained on a diverse dataset not tailored to medical images, particularly ultrasound images. Ultrasound images tend to have a lot of noise, making it difficult to segment out important structures. In this project, we developed ClickSAM, which fine-tunes the Segment Anything Model using click prompts for ultrasound images. ClickSAM has two stages of training: the first stage is trained on single-click prompts centered in the ground-truth contours, and the second stage focuses on improving the model performance through additional positive and negative click prompts. By comparing the first stage predictions to the ground-truth masks, true positive, false positive, and false negative segments are calculated. Positive clicks are generated using the true positive and false negative segments, and negative clicks are generated using the false positive segments. The Centroidal Voronoi Tessellation algorithm is then employed to collect positive and negative click prompts in each segment that are used to enhance the model performance during the second stage of training. With click-train methods, ClickSAM exhibits superior performance compared to other existing models for ultrasound image segmentation.
- [1139] arXiv:2402.05906 (cross-list from cs.LG) [ pdf , ps , other ]
-
Title: Risk-Sensitive Multi-Agent Reinforcement Learning in Network Aggregative Markov GamesSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Multiagent Systems (cs.MA)
Abstract: Classical multi-agent reinforcement learning (MARL) assumes risk neutrality and complete objectivity for agents. However, in settings where agents need to consider or model human economic or social preferences, a notion of risk must be incorporated into the RL optimization problem. This will be of greater importance in MARL where other human or non-human agents are involved, possibly with their own risk-sensitive policies. In this work, we consider risk-sensitive and non-cooperative MARL with cumulative prospect theory (CPT), a non-convex risk measure and a generalization of coherent measures of risk. CPT is capable of explaining loss aversion in humans and their tendency to overestimate/underestimate small/large probabilities. We propose a distributed sampling-based actor-critic (AC) algorithm with CPT risk for network aggregative Markov games (NAMGs), which we call Distributed Nested CPT-AC. Under a set of assumptions, we prove the convergence of the algorithm to a subjective notion of Markov perfect Nash equilibrium in NAMGs. The experimental results show that subjective CPT policies obtained by our algorithm can be different from the risk-neutral ones, and agents with a higher loss aversion are more inclined to socially isolate themselves in an NAMG.
- [1140] arXiv:2402.05932 (cross-list from cs.RO) [ pdf , ps , html , other ]
-
Title: Driving Everywhere with Large Language Model Policy AdaptationComments: CVPR 2024, featured in GTC 2024: this https URLSubjects: Robotics (cs.RO) ; Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
Abstract: Adapting driving behavior to new environments, customs, and laws is a long-standing problem in autonomous driving, precluding the widespread deployment of autonomous vehicles (AVs). In this paper, we present LLaDA, a simple yet powerful tool that enables human drivers and autonomous vehicles alike to drive everywhere by adapting their tasks and motion plans to traffic rules in new locations. LLaDA achieves this by leveraging the impressive zero-shot generalizability of large language models (LLMs) in interpreting the traffic rules in the local driver handbook. Through an extensive user study, we show that LLaDA's instructions are useful in disambiguating in-the-wild unexpected situations. We also demonstrate LLaDA's ability to adapt AV motion planning policies in real-world datasets; LLaDA outperforms baseline planning approaches on all our metrics. Please check our website for more details: this https URL .
- [1141] arXiv:2402.05933 (cross-list from cs.LG) [ pdf , ps , other ]
-
Title: Time Series Diffusion in the Frequency DomainComments: 27 pages, 12 figuresSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Abstract: Fourier analysis has been an instrumental tool in the development of signal processing. This leads us to wonder whether this framework could similarly benefit generative modelling. In this paper, we explore this question through the scope of time series diffusion models. More specifically, we analyze whether representing time series in the frequency domain is a useful inductive bias for score-based diffusion models. By starting from the canonical SDE formulation of diffusion in the time domain, we show that a dual diffusion process occurs in the frequency domain with an important nuance: Brownian motions are replaced by what we call mirrored Brownian motions, characterized by mirror symmetries among their components. Building on this insight, we show how to adapt the denoising score matching approach to implement diffusion models in the frequency domain. This results in frequency diffusion models, which we compare to canonical time diffusion models. Our empirical evaluation on real-world datasets, covering various domains like healthcare and finance, shows that frequency diffusion models better capture the training distribution than time diffusion models. We explain this observation by showing that time series from these datasets tend to be more localized in the frequency domain than in the time domain, which makes them easier to model in the former case. All our observations point towards impactful synergies between Fourier analysis and diffusion models.
- [1142] arXiv:2402.05935 (cross-list from cs.CV) [ pdf , ps , other ]
-
Title: SPHINX-X: Scaling Data and Parameters for a Family of Multi-modal Large Language ModelsPeng Gao , Renrui Zhang , Chris Liu , Longtian Qiu , Siyuan Huang , Weifeng Lin , Shitian Zhao , Shijie Geng , Ziyi Lin , Peng Jin , Kaipeng Zhang , Wenqi Shao , Chao Xu , Conghui He , Junjun He , Hao Shao , Pan Lu , Hongsheng Li , Yu QiaoComments: Code and models are released at this https URLSubjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG)
Abstract: We propose SPHINX-X, an extensive Multimodality Large Language Model (MLLM) series developed upon SPHINX. To improve the architecture and training efficiency, we modify the SPHINX framework by removing redundant visual encoders, bypassing fully-padded sub-images with skip tokens, and simplifying multi-stage training into a one-stage all-in-one paradigm. To fully unleash the potential of MLLMs, we assemble a comprehensive multi-domain and multimodal dataset covering publicly available resources in language, vision, and vision-language tasks. We further enrich this collection with our curated OCR intensive and Set-of-Mark datasets, extending the diversity and generality. By training over different base LLMs including TinyLlama1.1B, InternLM2-7B, LLaMA2-13B, and Mixtral8x7B, we obtain a spectrum of MLLMs that vary in parameter size and multilingual capabilities. Comprehensive benchmarking reveals a strong correlation between the multi-modal performance with the data and parameter scales. Code and models are released at this https URL
- [1143] arXiv:2402.05940 (cross-list from cs.LG) [ pdf , ps , other ]
-
Title: Causal Relationship Network of Risk Factors Impacting Workday Loss in Underground Coal MinesShangsi Ren , Cameron A. Beeche , Zhiyi Shi , Maria Acevedo Garcia , Katherine Zychowski , Shuguang Leng , Pedram Roghanchi , Jiantao PuComments: 5 figures 5 tablesSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Methodology (stat.ME)
Abstract: This study aims to establish the causal relationship network between various factors leading to workday loss in underground coal mines using a novel causal artificial intelligence (AI) method. The analysis utilizes data obtained from the National Institute for Occupational Safety and Health (NIOSH). A total of 101,010 injury records from 3,982 unique underground coal mines spanning the years from 1990 to 2020 were extracted from the NIOSH database. Causal relationships were analyzed and visualized using a novel causal AI method called Grouped Greedy Equivalence Search (GGES). The impact of each variable on workday loss was assessed through intervention do-calculus adjustment (IDA) scores. Model training and validation were performed using the 10-fold cross-validation technique. Performance metrics, including adjacency precision (AP), adjacency recall (AR), arrowhead precision (AHP), and arrowhead recall (AHR), were utilized to evaluate the models. Findings revealed that after 2006, key direct causes of workday loss among mining employees included total mining experience, mean office employees, mean underground employees, county, and total mining experience (years). Total mining experience emerged as the most influential factor, whereas mean employees per mine exhibited the least influence. The analyses emphasized the significant role of total mining experience in determining workday loss. The models achieved optimal performance, with AP, AR, AHP, and AHR values measuring 0.694, 0.653, 0.386, and 0.345, respectively. This study demonstrates the feasibility of utilizing the new GGES method to clarify the causal factors behind the workday loss by analyzing employment demographics and injury records and establish their causal relationship network.
- [1144] arXiv:2402.05941 (cross-list from cs.IR) [ pdf , ps , other ]
-
Title: Character-based Outfit Generation with Vision-augmented Style Extraction via LLMsNajmeh Forouzandehmehr , Yijie Cao , Nikhil Thakurdesai , Ramin Giahi , Luyi Ma , Nima Farrokhsiar , Jianpeng Xu , Evren Korpeoglu , Kannan AchanComments: 7 pages, 4 figures, IEEE Big Data 2023 3rd Workshop on Multimodal AI (MMAI 2023), IEEE BigData 2023Subjects: Information Retrieval (cs.IR) ; Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Abstract: The outfit generation problem involves recommending a complete outfit to a user based on their interests. Existing approaches focus on recommending items based on anchor items or specific query styles but do not consider customer interests in famous characters from movie, social media, etc. In this paper, we define a new Character-based Outfit Generation (COG) problem, designed to accurately interpret character information and generate complete outfit sets according to customer specifications such as age and gender. To tackle this problem, we propose a novel framework LVA-COG that leverages Large Language Models (LLMs) to extract insights from customer interests (e.g., character information) and employ prompt engineering techniques for accurate understanding of customer preferences. Additionally, we incorporate text-to-image models to enhance the visual understanding and generation (factual or counterfactual) of cohesive outfits. Our framework integrates LLMs with text-to-image models and improves the customer's approach to fashion by generating personalized recommendations. With experiments and case studies, we demonstrate the effectiveness of our solution from multiple dimensions.
- [1145] arXiv:2402.05942 (cross-list from cs.LG) [ pdf , ps , html , other ]
-
Title: Cooperative Knowledge Distillation: A Learner Agnostic ApproachComments: 8 pages, 7 figures, AAAI24Subjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Abstract: Knowledge distillation is a simple but powerful way to transfer knowledge between a teacher model to a student model. Existing work suffers from at least one of the following key limitations in terms of direction and scope of transfer which restrict its use: all knowledge is transferred from teacher to student regardless of whether or not that knowledge is useful, the student is the only one learning in this exchange, and typically distillation transfers knowledge only from a single teacher to a single student. We formulate a novel form of knowledge distillation in which many models can act as both students and teachers which we call cooperative distillation. The models cooperate as follows: a model (the student) identifies specific deficiencies in it's performance and searches for another model (the teacher) who encodes learned knowledge into instructional virtual instances via counterfactual instance generation. Because different models may have different strengths and weaknesses, all models can act as either students or teachers (cooperation) when appropriate and only distill knowledge in areas specific to their strengths (focus). Since counterfactuals as a paradigm are not tied to any specific algorithm, we can use this method to distill knowledge between learners of different architectures, algorithms, and even feature spaces. We demonstrate that our approach not only outperforms baselines such as transfer learning, self-supervised learning, and multiple knowledge distillation algorithms on several datasets, but it can also be used in settings where the aforementioned techniques cannot.
- [1146] arXiv:2402.05943 (cross-list from cs.LG) [ pdf , ps , other ]
-
Title: A hybrid IndRNNLSTM approach for real-time anomaly detection in software-defined networksSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Networking and Internet Architecture (cs.NI)
Abstract: Anomaly detection in SDN using data flow prediction is a difficult task. This problem is included in the category of time series and regression problems. Machine learning approaches are challenging in this field due to the manual selection of features. On the other hand, deep learning approaches have important features due to the automatic selection of features. Meanwhile, RNN-based approaches have been used the most. The LSTM and GRU approaches learn dependent entities well; on the other hand, the IndRNN approach learns non-dependent entities in time series. The proposed approach tried to use a combination of IndRNN and LSTM approaches to learn dependent and non-dependent features. Feature selection approaches also provide a suitable view of features for the models; for this purpose, four feature selection models, Filter, Wrapper, Embedded, and Autoencoder were used. The proposed IndRNNLSTM algorithm, in combination with Embedded, was able to achieve MAE=1.22 and RMSE=9.92 on NSL-KDD data.
- [1147] arXiv:2402.05945 (cross-list from cs.LG) [ pdf , ps , html , other ]
-
Title: Eliminating Information Leakage in Hard Concept Bottleneck Models with Supervised, Hierarchical Concept LearningSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Abstract: Concept Bottleneck Models (CBMs) aim to deliver interpretable and interventionable predictions by bridging features and labels with human-understandable concepts. While recent CBMs show promising potential, they suffer from information leakage, where unintended information beyond the concepts (either when concepts are represented with probabilities or binary states) are leaked to the subsequent label prediction. Consequently, distinct classes are falsely classified via indistinguishable concepts, undermining the interpretation and intervention of CBMs.
This paper alleviates the information leakage issue by introducing label supervision in concept predication and constructing a hierarchical concept set. Accordingly, we propose a new paradigm of CBMs, namely SupCBM, which achieves label predication via predicted concepts and a deliberately-designed intervention matrix. SupCBM focuses on concepts that are mostly relevant to the predicted label and only distinguishes classes when different concepts are presented. Our evaluations show that SupCBM outperforms SOTA CBMs over diverse datasets. It also manifests better generality across different backbone models. With proper quantification of information leakage in different CBMs, we demonstrate that SupCBM significantly reduces the information leakage. - [1148] arXiv:2402.05946 (cross-list from cs.LG) [ pdf , ps , html , other ]
-
Title: Unveiling Latent Causal Rules: A Temporal Point Process Approach for Abnormal Event ExplanationComments: Accepted by AISTATS 2024Subjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Abstract: In high-stakes systems such as healthcare, it is critical to understand the causal reasons behind unusual events, such as sudden changes in patient's health. Unveiling the causal reasons helps with quick diagnoses and precise treatment planning. In this paper, we propose an automated method for uncovering "if-then" logic rules to explain observational events. We introduce temporal point processes to model the events of interest, and discover the set of latent rules to explain the occurrence of events. To achieve this, we employ an Expectation-Maximization (EM) algorithm. In the E-step, we calculate the likelihood of each event being explained by each discovered rule. In the M-step, we update both the rule set and model parameters to enhance the likelihood function's lower bound. Notably, we optimize the rule set in a differential manner. Our approach demonstrates accurate performance in both discovering rules and identifying root causes. We showcase its promising results using synthetic and real healthcare datasets.
- [1149] arXiv:2402.05950 (cross-list from cs.LG) [ pdf , ps , other ]
-
Title: SQT -- std $Q$-targetSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Abstract: Std $Q$-target is a conservative, actor-critic, ensemble, $Q$-learning-based algorithm, which is based on a single key $Q$-formula: $Q$-networks standard deviation, which is an "uncertainty penalty", and, serves as a minimalistic solution to the problem of overestimation bias. We implement SQT on top of TD3/TD7 code and test it against the state-of-the-art (SOTA) actor-critic algorithms, DDPG, TD3 and TD7 on seven popular MuJoCo and Bullet tasks. Our results demonstrate SQT's $Q$-target formula superiority over TD3's $Q$-target formula as a conservative solution to overestimation bias in RL, while SQT shows a clear performance advantage on a wide margin over DDPG, TD3, and TD7 on all tasks.
- [1150] arXiv:2402.05951 (cross-list from cs.LG) [ pdf , ps , other ]
-
Title: MinMaxMin $Q$-learningSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Abstract: MinMaxMin $Q$-learning is a novel optimistic Actor-Critic algorithm that addresses the problem of overestimation bias ($Q$-estimations are overestimating the real $Q$-values) inherent in conservative RL algorithms. Its core formula relies on the disagreement among $Q$-networks in the form of the min-batch MaxMin $Q$-networks distance which is added to the $Q$-target and used as the priority experience replay sampling-rule. We implement MinMaxMin on top of TD3 and TD7, subjecting it to rigorous testing against state-of-the-art continuous-space algorithms-DDPG, TD3, and TD7-across popular MuJoCo and Bullet environments. The results show a consistent performance improvement of MinMaxMin over DDPG, TD3, and TD7 across all tested tasks.
- [1151] arXiv:2402.05952 (cross-list from cs.LG) [ pdf , ps , html , other ]
-
Title: Advancing Graph Representation Learning with Large Language Models: A Comprehensive Survey of TechniquesSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
Abstract: The integration of Large Language Models (LLMs) with Graph Representation Learning (GRL) marks a significant evolution in analyzing complex data structures. This collaboration harnesses the sophisticated linguistic capabilities of LLMs to improve the contextual understanding and adaptability of graph models, thereby broadening the scope and potential of GRL. Despite a growing body of research dedicated to integrating LLMs into the graph domain, a comprehensive review that deeply analyzes the core components and operations within these models is notably lacking. Our survey fills this gap by proposing a novel taxonomy that breaks down these models into primary components and operation techniques from a novel technical perspective. We further dissect recent literature into two primary components including knowledge extractors and organizers, and two operation techniques including integration and training stratigies, shedding light on effective model design and training strategies. Additionally, we identify and explore potential future research avenues in this nascent yet underexplored field, proposing paths for continued progress.
- [1152] arXiv:2402.05959 (cross-list from cs.LG) [ pdf , ps , html , other ]
-
Title: Nature-Inspired Local PropagationSubjects: Machine Learning (cs.LG) ; Disordered Systems and Neural Networks (cond-mat.dis-nn); Artificial Intelligence (cs.AI); Neural and Evolutionary Computing (cs.NE)
Abstract: The spectacular results achieved in machine learning, including the recent advances in generative AI, rely on large data collections. On the opposite, intelligent processes in nature arises without the need for such collections, but simply by online processing of the environmental information. In particular, natural learning processes rely on mechanisms where data representation and learning are intertwined in such a way to respect spatiotemporal locality. This paper shows that such a feature arises from a pre-algorithmic view of learning that is inspired by related studies in Theoretical Physics. We show that the algorithmic interpretation of the derived "laws of learning", which takes the structure of Hamiltonian equations, reduces to Backpropagation when the speed of propagation goes to infinity. This opens the doors to machine learning studies based on full on-line information processing that are based the replacement of Backpropagation with the proposed spatiotemporal local algorithm.
- [1153] arXiv:2402.05963 (cross-list from cs.LG) [ pdf , ps , html , other ]
-
Title: Frugal Actor-Critic: Sample Efficient Off-Policy Deep Reinforcement Learning Using Unique ExperiencesSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Robotics (cs.RO); Systems and Control (eess.SY)
Abstract: Efficient utilization of the replay buffer plays a significant role in the off-policy actor-critic reinforcement learning (RL) algorithms used for model-free control policy synthesis for complex dynamical systems. We propose a method for achieving sample efficiency, which focuses on selecting unique samples and adding them to the replay buffer during the exploration with the goal of reducing the buffer size and maintaining the independent and identically distributed (IID) nature of the samples. Our method is based on selecting an important subset of the set of state variables from the experiences encountered during the initial phase of random exploration, partitioning the state space into a set of abstract states based on the selected important state variables, and finally selecting the experiences with unique state-reward combination by using a kernel density estimator. We formally prove that the off-policy actor-critic algorithm incorporating the proposed method for unique experience accumulation converges faster than the vanilla off-policy actor-critic algorithm. Furthermore, we evaluate our method by comparing it with two state-of-the-art actor-critic RL algorithms on several continuous control benchmarks available in the Gym environment. Experimental results demonstrate that our method achieves a significant reduction in the size of the replay buffer for all the benchmarks while achieving either faster convergent or better reward accumulation compared to the baseline algorithms.
- [1154] arXiv:2402.05966 (cross-list from cs.LG) [ pdf , ps , html , other ]
-
Title: Rethink Model Re-Basin and the Linear Mode ConnectivityComments: 40 pagesSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Abstract: Recent studies suggest that with sufficiently wide models, most SGD solutions can, up to permutation, converge into the same basin. This phenomenon, known as the model re-basin regime, has significant implications for model averaging. However, current re-basin strategies are limited in effectiveness due to a lack of comprehensive understanding of underlying mechanisms. Addressing this gap, our work revisits standard practices and uncovers the frequent inadequacies of existing matching algorithms, which we show can be mitigated through proper re-normalization. By introducing a more direct analytical approach, we expose the interaction between matching algorithms and re-normalization processes. This perspective not only clarifies and refines previous findings but also facilitates novel insights. For instance, it connects the linear mode connectivity to pruning, motivating a lightweight yet effective post-pruning plug-in that can be directly merged with any existing pruning techniques. Our implementation is available at this https URL .
- [1155] arXiv:2402.05967 (cross-list from cs.LG) [ pdf , ps , html , other ]
-
Title: The last Dance : Robust backdoor attack via diffusion models and bayesian approachComments: Preprint (Last update): audio backdoor attack on Hugging Face's Transformer pre-trained models. This attack incorporates state-of-the-art Bayesian techniques, a modified Fokker-Planck equation (via Yang-Mills), and a diffusion model approachSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Cryptography and Security (cs.CR); Signal Processing (eess.SP)
Abstract: Diffusion models are state-of-the-art deep learning generative models that are trained on the principle of learning forward and backward diffusion processes via the progressive addition of noise and denoising. In this paper, we aim to fool audio-based DNN models, such as those from the Hugging Face framework, primarily those that focus on audio, in particular transformer-based artificial intelligence models, which are powerful machine learning models that save time and achieve results faster and more efficiently. We demonstrate the feasibility of backdoor attacks (called `BacKBayDiffMod`) on audio transformers derived from Hugging Face, a popular framework in the world of artificial intelligence research. The backdoor attack developed in this paper is based on poisoning model training data uniquely by incorporating backdoor diffusion sampling and a Bayesian approach to the distribution of poisoned data.
- [1156] arXiv:2402.05968 (cross-list from cs.LG) [ pdf , ps , html , other ]
-
Title: Federated Learning Priorities Under the European Union Artificial Intelligence ActHerbert Woisetschläger , Alexander Erben , Bill Marino , Shiqiang Wang , Nicholas D. Lane , Ruben Mayer , Hans-Arno JacobsenSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Computers and Society (cs.CY); Distributed, Parallel, and Cluster Computing (cs.DC)
Abstract: The age of AI regulation is upon us, with the European Union Artificial Intelligence Act (AI Act) leading the way. Our key inquiry is how this will affect Federated Learning (FL), whose starting point of prioritizing data privacy while performing ML fundamentally differs from that of centralized learning. We believe the AI Act and future regulations could be the missing catalyst that pushes FL toward mainstream adoption. However, this can only occur if the FL community reprioritizes its research focus. In our position paper, we perform a first-of-its-kind interdisciplinary analysis (legal and ML) of the impact the AI Act may have on FL and make a series of observations supporting our primary position through quantitative and qualitative analysis. We explore data governance issues and the concern for privacy. We establish new challenges regarding performance and energy efficiency within lifecycle monitoring. Taken together, our analysis suggests there is a sizable opportunity for FL to become a crucial component of AI Act-compliant ML systems and for the new regulation to drive the adoption of FL techniques in general. Most noteworthy are the opportunities to defend against data bias and enhance private and secure computation
- [1157] arXiv:2402.05970 (cross-list from cs.LG) [ pdf , ps , html , other ]
-
Title: Modeling Spatio-temporal Dynamical Systems with Neural Discrete Learning and Levels-of-ExpertsKun Wang , Hao Wu , Guibin Zhang , Junfeng Fang , Yuxuan Liang , Yuankai Wu , Roger Zimmermann , Yang WangSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Abstract: In this paper, we address the issue of modeling and estimating changes in the state of the spatio-temporal dynamical systems based on a sequence of observations like video frames. Traditional numerical simulation systems depend largely on the initial settings and correctness of the constructed partial differential equations (PDEs). Despite recent efforts yielding significant success in discovering data-driven PDEs with neural networks, the limitations posed by singular scenarios and the absence of local insights prevent them from performing effectively in a broader real-world context. To this end, this paper propose the universal expert module -- that is, optical flow estimation component, to capture the evolution laws of general physical processes in a data-driven fashion. To enhance local insight, we painstakingly design a finer-grained physical pipeline, since local characteristics may be influenced by various internal contextual information, which may contradict the macroscopic properties of the whole system. Further, we harness currently popular neural discrete learning to unveil the underlying important features in its latent space, this process better injects interpretability, which can help us obtain a powerful prior over these discrete random variables. We conduct extensive experiments and ablations to demonstrate that the proposed framework achieves large performance margins, compared with the existing SOTA baselines.
- [1158] arXiv:2402.05975 (cross-list from eess.IV) [ pdf , ps , other ]
-
Title: A Deep Learning Approach for Brain Tumor Classification and Segmentation Using a Multiscale Convolutional Neural NetworkFrancisco Javier Díaz-Pernas , Mario Martínez-Zarzuela , Míriam Antón-Rodríguez , David González-OrtegaComments: 14 pagesJournal-ref: Healthcare 2021, 9, 153Subjects: Image and Video Processing (eess.IV) ; Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Abstract: In this paper, we present a fully automatic brain tumor segmentation and classification model using a Deep Convolutional Neural Network that includes a multiscale approach. One of the differences of our proposal with respect to previous works is that input images are processed in three spatial scales along different processing pathways. This mechanism is inspired in the inherent operation of the Human Visual System. The proposed neural model can analyze MRI images containing three types of tumors: meningioma, glioma, and pituitary tumor, over sagittal, coronal, and axial views and does not need preprocessing of input images to remove skull or vertebral column parts in advance. The performance of our method on a publicly available MRI image dataset of 3064 slices from 233 patients is compared with previously classical machine learning and deep learning published methods. In the comparison, our method remarkably obtained a tumor classification accuracy of 0.973, higher than the other approaches using the same database.
- [1159] arXiv:2402.05976 (cross-list from cs.LG) [ pdf , ps , html , other ]
-
Title: RankSum An unsupervised extractive text summarization based on rank fusionSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Abstract: In this paper, we propose Ranksum, an approach for extractive text summarization of single documents based on the rank fusion of four multi-dimensional sentence features extracted for each sentence: topic information, semantic content, significant keywords, and position. The Ranksum obtains the sentence saliency rankings corresponding to each feature in an unsupervised way followed by the weighted fusion of the four scores to rank the sentences according to their significance. The scores are generated in completely unsupervised way, and a labeled document set is required to learn the fusion weights. Since we found that the fusion weights can generalize to other datasets, we consider the Ranksum as an unsupervised approach. To determine topic rank, we employ probabilistic topic models whereas semantic information is captured using sentence embeddings. To derive rankings using sentence embeddings, we utilize Siamese networks to produce abstractive sentence representation and then we formulate a novel strategy to arrange them in their order of importance. A graph-based strategy is applied to find the significant keywords and related sentence rankings in the document. We also formulate a sentence novelty measure based on bigrams, trigrams, and sentence embeddings to eliminate redundant sentences from the summary. The ranks of all the sentences computed for each feature are finally fused to get the final score for each sentence in the document. We evaluate our approach on publicly available summarization datasets CNN/DailyMail and DUC 2002. Experimental results show that our approach outperforms other existing state-of-the-art summarization methods.
- [1160] arXiv:2402.05977 (cross-list from cs.CV) [ pdf , ps , other ]
-
Title: Tool wear monitoring using an online, automatic and low cost system based on local textureSubjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: In this work we propose a new online, low cost and fast approach based on computer vision and machine learning to determine whether cutting tools used in edge profile milling processes are serviceable or disposable based on their wear level. We created a new dataset of 254 images of edge profile cutting heads which is, to the best of our knowledge, the first publicly available dataset with enough quality for this purpose. All the inserts were segmented and their cutting edges were cropped, obtaining 577 images of cutting edges: 301 functional and 276 disposable. The proposed method is based on (1) dividing the cutting edge image in different regions, called Wear Patches (WP), (2) characterising each one as worn or serviceable using texture descriptors based on different variants of Local Binary Patterns (LBP) and (3) determine, based on the state of these WP, if the cutting edge (and, therefore, the tool) is serviceable or disposable. We proposed and assessed five different patch division configurations. The individual WP were classified by a Support Vector Machine (SVM) with an intersection kernel. The best patch division configuration and texture descriptor for the WP achieves an accuracy of 90.26% in the detection of the disposable cutting edges. These results show a very promising opportunity for automatic wear monitoring in edge profile milling processes.
- [1161] arXiv:2402.05978 (cross-list from cs.CV) [ pdf , ps , other ]
-
Title: Combining shape and contour features to improve tool wear monitoring in milling processesSubjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: In this paper, a new system based on combinations of a shape descriptor and a contour descriptor has been proposed for classifying inserts in milling processes according to their wear level following a computer vision based approach. To describe the wear region shape we have proposed a new descriptor called ShapeFeat and its contour has been characterized using the method BORCHIZ that, to the best of our knowledge, achieves the best performance for tool wear monitoring following a computer vision-based approach. Results show that the combination of BORCHIZ with ShapeFeat using a late fusion method improves the classification performance significantly, obtaining an accuracy of 91.44% in the binary classification (i.e. the classification of the wear as high or low) and 82.90% using three target classes (i.e. classification of the wear as high, medium or low). These results outperform the ones obtained by both descriptors used on their own, which achieve accuracies of 88.70 and 80.67% for two and three classes, respectively, using ShapeFeat and 87.06 and 80.24% with B-ORCHIZ. This study yielded encouraging results for the manufacturing community in order to classify automatically the inserts in terms of their wear for milling processes.
- [1162] arXiv:2402.05979 (cross-list from cs.SE) [ pdf , ps , html , other ]
-
Title: On the Standardization of Behavioral Use Clauses and Their Adoption for Responsible Licensing of AIDaniel McDuff , Tim Korjakow , Scott Cambo , Jesse Josua Benjamin , Jenny Lee , Yacine Jernite , Carlos Muñoz Ferrandis , Aaron Gokaslan , Alek Tarkowski , Joseph Lindley , A. Feder Cooper , Danish ContractorSubjects: Software Engineering (cs.SE) ; Artificial Intelligence (cs.AI)
Abstract: Growing concerns over negligent or malicious uses of AI have increased the appetite for tools that help manage the risks of the technology. In 2018, licenses with behaviorial-use clauses (commonly referred to as Responsible AI Licenses) were proposed to give developers a framework for releasing AI assets while specifying their users to mitigate negative applications. As of the end of 2023, on the order of 40,000 software and model repositories have adopted responsible AI licenses licenses. Notable models licensed with behavioral use clauses include BLOOM (language) and LLaMA2 (language), Stable Diffusion (image), and GRID (robotics). This paper explores why and how these licenses have been adopted, and why and how they have been adapted to fit particular use cases. We use a mixed-methods methodology of qualitative interviews, clustering of license clauses, and quantitative analysis of license adoption. Based on this evidence we take the position that responsible AI licenses need standardization to avoid confusing users or diluting their impact. At the same time, customization of behavioral restrictions is also appropriate in some contexts (e.g., medical domains). We advocate for ``standardized customization'' that can meet users' needs and can be supported via tooling.
- [1163] arXiv:2402.05980 (cross-list from cs.SE) [ pdf , ps , html , other ]
-
Title: Do Large Code Models Understand Programming Concepts? A Black-box ApproachAshish Hooda , Mihai Christodorescu , Miltiadis Allamanis , Aaron Wilson , Kassem Fawaz , Somesh JhaSubjects: Software Engineering (cs.SE) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Programming Languages (cs.PL)
Abstract: Large Language Models' success on text generation has also made them better at code generation and coding tasks. While a lot of work has demonstrated their remarkable performance on tasks such as code completion and editing, it is still unclear as to why. We help bridge this gap by exploring to what degree auto-regressive models understand the logical constructs of the underlying programs. We propose Counterfactual Analysis for Programming Concept Predicates (CACP) as a counterfactual testing framework to evaluate whether Large Code Models understand programming concepts. With only black-box access to the model, we use CACP to evaluate ten popular Large Code Models for four different programming concepts. Our findings suggest that current models lack understanding of concepts such as data flow and control flow.
- [1164] arXiv:2402.06004 (cross-list from cs.CV) [ pdf , ps , html , other ]
-
Title: Memory-Efficient Vision Transformers: An Activation-Aware Mixed-Rank Compression StrategySubjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
Abstract: As Vision Transformers (ViTs) increasingly set new benchmarks in computer vision, their practical deployment on inference engines is often hindered by their significant memory bandwidth and (on-chip) memory footprint requirements. This paper addresses this memory limitation by introducing an activation-aware model compression methodology that uses selective low-rank weight tensor approximations of different layers to reduce the parameter count of ViTs. The key idea is to decompose the weight tensors into a sum of two parameter-efficient tensors while minimizing the error between the product of the input activations with the original weight tensor and the product of the input activations with the approximate tensor sum. This approximation is further refined by adopting an efficient layer-wise error compensation technique that uses the gradient of the layer's output loss. The combination of these techniques achieves excellent results while it avoids being trapped in a shallow local minimum early in the optimization process and strikes a good balance between the model compression and output accuracy. Notably, the presented method significantly reduces the parameter count of DeiT-B by 60% with less than 1% accuracy drop on the ImageNet dataset, overcoming the usual accuracy degradation seen in low-rank approximations. In addition to this, the presented compression technique can compress large DeiT/ViT models to have about the same model size as smaller DeiT/ViT variants while yielding up to 1.8% accuracy gain. These results highlight the efficacy of our approach, presenting a viable solution for embedding ViTs in memory-constrained environments without compromising their performance.
- [1165] arXiv:2402.06023 (cross-list from cs.LG) [ pdf , ps , html , other ]
-
Title: Decision Theory-Guided Deep Reinforcement Learning for Fast LearningSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Computer Science and Game Theory (cs.GT)
Abstract: This paper introduces a novel approach, Decision Theory-guided Deep Reinforcement Learning (DT-guided DRL), to address the inherent cold start problem in DRL. By integrating decision theory principles, DT-guided DRL enhances agents' initial performance and robustness in complex environments, enabling more efficient and reliable convergence during learning. Our investigation encompasses two primary problem contexts: the cart pole and maze navigation challenges. Experimental results demonstrate that the integration of decision theory not only facilitates effective initial guidance for DRL agents but also promotes a more structured and informed exploration strategy, particularly in environments characterized by large and intricate state spaces. The results of experiment demonstrate that DT-guided DRL can provide significantly higher rewards compared to regular DRL. Specifically, during the initial phase of training, the DT-guided DRL yields up to an 184% increase in accumulated reward. Moreover, even after reaching convergence, it maintains a superior performance, ending with up to 53% more reward than standard DRL in large maze problems. DT-guided DRL represents an advancement in mitigating a fundamental challenge of DRL by leveraging functions informed by human (designer) knowledge, setting a foundation for further research in this promising interdisciplinary domain.
- [1166] arXiv:2402.06026 (cross-list from quant-ph) [ pdf , ps , html , other ]
-
Title: Quantum neural network with ensemble learning to mitigate barren plateaus and cost function concentrationSubjects: Quantum Physics (quant-ph) ; Artificial Intelligence (cs.AI)
Abstract: The rapid development of quantum computers promises transformative impacts across diverse fields of science and technology. Quantum neural networks (QNNs), as a forefront application, hold substantial potential. Despite the multitude of proposed models in the literature, persistent challenges, notably the vanishing gradient (VG) and cost function concentration (CFC) problems, impede their widespread success. In this study, we introduce a novel approach to quantum neural network construction, specifically addressing the issues of VG and CFC. Our methodology employs ensemble learning, advocating for the simultaneous deployment of multiple quantum circuits with a depth equal to $1$, a departure from the conventional use of a single quantum circuit with depth $L$. We assess the efficacy of our proposed model through a comparative analysis with a conventionally constructed QNN. The evaluation unfolds in the context of a classification problem, yielding valuable insights into the potential advantages of our innovative approach.
- [1167] arXiv:2402.06030 (cross-list from cs.LG) [ pdf , ps , html , other ]
-
Title: Game-theoretic Counterfactual Explanation for Graph Neural NetworksComments: Accepted to WWW 2024Subjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Abstract: Graph Neural Networks (GNNs) have been a powerful tool for node classification tasks in complex networks. However, their decision-making processes remain a black-box to users, making it challenging to understand the reasoning behind their predictions. Counterfactual explanations (CFE) have shown promise in enhancing the interpretability of machine learning models. Prior approaches to compute CFE for GNNS often are learning-based approaches that require training additional graphs. In this paper, we propose a semivalue-based, non-learning approach to generate CFE for node classification tasks, eliminating the need for any additional training. Our results reveals that computing Banzhaf values requires lower sample complexity in identifying the counterfactual explanations compared to other popular methods such as computing Shapley values. Our empirical evidence indicates computing Banzhaf values can achieve up to a fourfold speed up compared to Shapley values. We also design a thresholding method for computing Banzhaf values and show theoretical and empirical results on its robustness in noisy environments, making it superior to Shapley values. Furthermore, the thresholded Banzhaf values are shown to enhance efficiency without compromising the quality (i.e., fidelity) in the explanations in three popular graph datasets.
- [1168] arXiv:2402.06034 (cross-list from cs.LG) [ pdf , ps , other ]
-
Title: Optimizing Predictive AI in Physical Design Flows with Mini Pixel Batch Gradient DescentComments: 7 pages, 2 figures, preprintSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Abstract: Exploding predictive AI has enabled fast yet effective evaluation and decision-making in modern chip physical design flows. State-of-the-art frameworks typically include the objective of minimizing the mean square error (MSE) between the prediction and the ground truth. We argue the averaging effect of MSE induces limitations in both model training and deployment, and good MSE behavior does not guarantee the capability of these models to assist physical design flows which are likely sabotaged due to a small portion of prediction error. To address this, we propose mini-pixel batch gradient descent (MPGD), a plug-and-play optimization algorithm that takes the most informative entries into consideration, offering probably faster and better convergence. Experiments on representative benchmark suits show the significant benefits of MPGD on various physical design prediction tasks using CNN or Graph-based models.
- [1169] arXiv:2402.06038 (cross-list from cs.LG) [ pdf , ps , html , other ]
-
Title: Contrastive Approach to Prior Free Positive Unlabeled LearningSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
Abstract: Positive Unlabeled (PU) learning refers to the task of learning a binary classifier given a few labeled positive samples, and a set of unlabeled samples (which could be positive or negative). In this paper, we propose a novel PU learning framework, that starts by learning a feature space through pretext-invariant representation learning and then applies pseudo-labeling to the unlabeled examples, leveraging the concentration property of the embeddings. Overall, our proposed approach handily outperforms state-of-the-art PU learning methods across several standard PU benchmark datasets, while not requiring a-priori knowledge or estimate of class prior. Remarkably, our method remains effective even when labeled data is scant, where most PU learning algorithms falter. We also provide simple theoretical analysis motivating our proposed algorithms and establish generalization guarantee for our approach.
- [1170] arXiv:2402.06046 (cross-list from cs.RO) [ pdf , ps , other ]
-
Title: Anatomy of a Robotaxi Crash: Lessons from the Cruise Pedestrian Dragging MishapComments: 15 pages, 2 figuresSubjects: Robotics (cs.RO) ; Artificial Intelligence (cs.AI)
Abstract: An October 2023 crash between a GM Cruise robotaxi and a pedestrian in San Francisco resulted not only in a severe injury, but also dramatic upheaval at that company that will likely have lasting effects throughout the industry. Is-sues stem not just from the loss events themselves, but also from how Cruise mishandled dealing with their robotaxi dragging a pedestrian under the vehicle after the initial post-crash stop. External investigation reports provide raw material describing the incident and critique the company's response from a regulatory point of view, but exclude safety engineering recommendations from scope. We highlight specific facts and relationships among events by tying together different pieces of the external report material. We then explore safety lessons that might be learned related to: recognizing and responding to nearby mishaps, building an accurate world model of a post-collision scenario, the in-adequacy of a so-called "minimal risk condition" strategy in complex situations, poor organizational discipline in responding to a mishap, overly aggressive post-collision automation choices that made a bad situation worse, and a reluctance to admit to a mishap causing much worse organizational harm down-stream.
- [1171] arXiv:2402.06053 (cross-list from cs.HC) [ pdf , ps , html , other ]
-
Title: Randomness Is All You Need: Semantic Traversal of Problem-Solution Spaces with Large Language ModelsSubjects: Human-Computer Interaction (cs.HC) ; Artificial Intelligence (cs.AI); Computers and Society (cs.CY)
Abstract: We present a novel approach to exploring innovation problem and solution domains using LLM fine-tuning with a custom idea database. By semantically traversing the bi-directional problem and solution tree at different temperature levels we achieve high diversity in solution edit distance while still remaining close to the original problem statement semantically. In addition to finding a variety of solutions to a given problem, this method can also be used to refine and clarify the original problem statement. As further validation of the approach, we implemented a proof-of-concept Slack bot to serve as an innovation assistant.
- [1172] arXiv:2402.06075 (cross-list from cs.LG) [ pdf , ps , other ]
-
Title: Scaling Artificial Intelligence for Digital Wargaming in Support of Decision-MakingJournal-ref: NATO STO-MP-MSG-207 2023Subjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Abstract: In this unprecedented era of technology-driven transformation, it becomes more critical than ever that we aggressively invest in developing robust artificial intelligence (AI) for wargaming in support of decision-making. By advancing AI-enabled systems and pairing these with human judgment, we will be able to enhance all-domain awareness, improve the speed and quality of our decision cycles, offer recommendations for novel courses of action, and more rapidly counter our adversary's actions. It therefore becomes imperative that we accelerate the development of AI to help us better address the complexity of modern challenges and dilemmas that currently requires human intelligence and, if possible, attempt to surpass human intelligence--not to replace humans, but to augment and better inform human decision-making at machine speed. Although deep reinforcement learning continues to show promising results in intelligent agent behavior development for the long-horizon, complex tasks typically found in combat modeling and simulation, further research is needed to enable the scaling of AI to deal with these intricate and expansive state-spaces characteristic of wargaming for either concept development, education, or analysis. To help address this challenge, in our research, we are developing and implementing a hierarchical reinforcement learning framework that includes a multi-model approach and dimension-invariant observation abstractions.
- [1173] arXiv:2402.06078 (cross-list from cs.RO) [ pdf , ps , html , other ]
-
Title: Gaussian Mixture Models for Affordance Learning using Bayesian NetworksComments: IEEE/RSJ International Conference on Intelligent Robots and Systems 2010Journal-ref: Published on the Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems 2010Subjects: Robotics (cs.RO) ; Artificial Intelligence (cs.AI)
Abstract: Affordances are fundamental descriptors of relationships between actions, objects and effects. They provide the means whereby a robot can predict effects, recognize actions, select objects and plan its behavior according to desired goals. This paper approaches the problem of an embodied agent exploring the world and learning these affordances autonomously from its sensory experiences. Models exist for learning the structure and the parameters of a Bayesian Network encoding this knowledge. Although Bayesian Networks are capable of dealing with uncertainty and redundancy, previous work considered complete observability of the discrete sensory data, which may lead to hard errors in the presence of noise. In this paper we consider a probabilistic representation of the sensors by Gaussian Mixture Models (GMMs) and explicitly taking into account the probability distribution contained in each discrete affordance concept, which can lead to a more correct learning.
- [1174] arXiv:2402.06079 (cross-list from q-bio.GN) [ pdf , ps , html , other ]
-
Title: DiscDiff: Latent Diffusion Model for DNA Sequence GenerationZehui Li , Yuhao Ni , William A V Beardall , Guoxuan Xia , Akashaditya Das , Guy-Bart Stan , Yiren ZhaoComments: Different from the prior work "Latent Diffusion Model for DNA Sequence Generation" ( arXiv:2310.06150 ), we updated the evaluation framework and compared the DiscDiff with other methods comprehensively. In addition, a post-training framework is proposed to increase the quality of generated sequencesSubjects: Genomics (q-bio.GN) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: This paper introduces a novel framework for DNA sequence generation, comprising two key components: DiscDiff, a Latent Diffusion Model (LDM) tailored for generating discrete DNA sequences, and Absorb-Escape, a post-training algorithm designed to refine these sequences. Absorb-Escape enhances the realism of the generated sequences by correcting `round errors' inherent in the conversion process between latent and input spaces. Our approach not only sets new standards in DNA sequence generation but also demonstrates superior performance over existing diffusion models, in generating both short and long DNA sequences. Additionally, we introduce EPD-GenDNA, the first comprehensive, multi-species dataset for DNA generation, encompassing 160,000 unique sequences from 15 species. We hope this study will advance the generative modelling of DNA, with potential implications for gene therapy and protein production.
- [1175] arXiv:2402.06082 (cross-list from cs.LG) [ pdf , ps , html , other ]
-
Title: SubGen: Token Generation in Sublinear Time and MemorySubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Data Structures and Algorithms (cs.DS)
Abstract: Despite the significant success of large language models (LLMs), their extensive memory requirements pose challenges for deploying them in long-context token generation. The substantial memory footprint of LLM decoders arises from the necessity to store all previous tokens in the attention module, a requirement imposed by key-value (KV) caching. In this work, our focus is on developing an efficient compression technique for the KV cache. Empirical evidence indicates a significant clustering tendency within key embeddings in the attention module. Building on this key insight, we have devised a novel caching method with sublinear complexity, employing online clustering on key tokens and online $\ell_2$ sampling on values. The result is a provably accurate and efficient attention decoding algorithm, termed SubGen. Not only does this algorithm ensure a sublinear memory footprint and sublinear time complexity, but we also establish a tight error bound for our approach. Empirical evaluations on long-context question-answering tasks demonstrate that SubGen significantly outperforms existing and state-of-the-art KV cache compression methods in terms of performance and efficiency.
- [1176] arXiv:2402.06086 (cross-list from cs.DC) [ pdf , ps , other ]
-
Title: Rhizomes and Diffusions for Processing Highly Skewed Graphs on Fine-Grain Message-Driven SystemsComments: arXiv admin note: text overlap with arXiv:2402.02576Subjects: Distributed, Parallel, and Cluster Computing (cs.DC) ; Artificial Intelligence (cs.AI); Data Structures and Algorithms (cs.DS)
Abstract: The paper provides a unified co-design of 1) a programming and execution model that allows spawning tasks from within the vertex data at runtime, 2) language constructs for \textit{actions} that send work to where the data resides, combining parallel expressiveness of local control objects (LCOs) to implement asynchronous graph processing primitives, 3) and an innovative vertex-centric data-structure, using the concept of Rhizomes, that parallelizes both the out and in-degree load of vertex objects across many cores and yet provides a single programming abstraction to the vertex objects. The data structure hierarchically parallelizes the out-degree load of vertices and the in-degree load laterally. The rhizomes internally communicate and remain consistent, using event-driven synchronization mechanisms, to provide a unified and correct view of the vertex.
Simulated experimental results show performance gains for BFS, SSSP, and Page Rank on large chip sizes for the tested input graph datasets containing highly skewed degree distribution. The improvements come from the ability to express and create fine-grain dynamic computing task in the form of \textit{actions}, language constructs that aid the compiler to generate code that the runtime system uses to optimally schedule tasks, and the data structure that shares both in and out-degree compute workload among memory-processing elements. - [1177] arXiv:2402.06104 (cross-list from cs.LG) [ pdf , ps , html , other ]
-
Title: Function Aligned Regression: A Method Explicitly Learns Functional Derivatives from DataComments: 24 pages excluding references, 12 figures, 4 tablesSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Abstract: Regression is a fundamental task in machine learning that has garnered extensive attention over the past decades. The conventional approach for regression involves employing loss functions that primarily concentrate on aligning model prediction with the ground truth for each individual data sample, which, as we show, can result in sub-optimal prediction of the relationships between the different samples. Recent research endeavors have introduced novel perspectives by incorporating label similarity information to regression. However, a notable gap persists in these approaches when it comes to fully capturing the intricacies of the underlying ground truth function. In this work, we propose FAR (Function Aligned Regression) as a arguably better and more efficient solution to fit the underlying function of ground truth by capturing functional derivatives. We demonstrate the effectiveness of the proposed method practically on 2 synthetic datasets and on 8 extensive real-world tasks from 6 benchmark datasets with other 8 competitive baselines. The code is open-sourced at \url{ this https URL }.
- [1178] arXiv:2402.06107 (cross-list from cs.CV) [ pdf , ps , html , other ]
-
Title: Multiple Instance Learning for Cheating Detection and Localization in Online ExaminationsComments: 12 pages, 7 figuresJournal-ref: IEEE Transactions on Cognitive and Developmental Systems 2024Subjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI); Computers and Society (cs.CY); Machine Learning (cs.LG)
Abstract: The spread of the Coronavirus disease-2019 epidemic has caused many courses and exams to be conducted online. The cheating behavior detection model in examination invigilation systems plays a pivotal role in guaranteeing the equality of long-distance examinations. However, cheating behavior is rare, and most researchers do not comprehensively take into account features such as head posture, gaze angle, body posture, and background information in the task of cheating behavior detection. In this paper, we develop and present CHEESE, a CHEating detection framework via multiplE inStancE learning. The framework consists of a label generator that implements weak supervision and a feature encoder to learn discriminative features. In addition, the framework combines body posture and background features extracted by 3D convolution with eye gaze, head posture and facial features captured by OpenFace 2.0. These features are fed into the spatio-temporal graph module by stitching to analyze the spatio-temporal changes in video clips to detect the cheating behaviors. Our experiments on three datasets, UCF-Crime, ShanghaiTech and Online Exam Proctoring (OEP), prove the effectiveness of our method as compared to the state-of-the-art approaches, and obtain the frame-level AUC score of 87.58% on the OEP dataset.
- [1179] arXiv:2402.06116 (cross-list from cs.RO) [ pdf , ps , html , other ]
-
Title: LLMs for Coding and Robotics EducationPeng Shu , Huaqin Zhao , Hanqi Jiang , Yiwei Li , Shaochen Xu , Yi Pan , Zihao Wu , Zhengliang Liu , Guoyu Lu , Le Guan , Gong Chen , Xianqiao Wang Tianming LiuComments: 20 pages, 6 figures, 1 tableSubjects: Robotics (cs.RO) ; Artificial Intelligence (cs.AI)
Abstract: Large language models and multimodal large language models have revolutionized artificial intelligence recently. An increasing number of regions are now embracing these advanced technologies. Within this context, robot coding education is garnering increasing attention. To teach young children how to code and compete in robot challenges, large language models are being utilized for robot code explanation, generation, and modification. In this paper, we highlight an important trend in robot coding education. We test several mainstream large language models on both traditional coding tasks and the more challenging task of robot code generation, which includes block diagrams. Our results show that GPT-4V outperforms other models in all of our tests but struggles with generating block diagram images.
- [1180] arXiv:2402.06118 (cross-list from cs.CV) [ pdf , ps , html , other ]
-
Title: ViGoR: Improving Visual Grounding of Large Vision Language Models with Fine-Grained Reward ModelingComments: 10 pages, 3 figuresSubjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI)
Abstract: By combining natural language understanding, generation capabilities, and breadth of knowledge of large language models with image perception, recent large vision language models (LVLMs) have shown unprecedented visual reasoning capabilities. However, the generated text often suffers from inaccurate grounding in the visual input, resulting in errors such as hallucination of nonexistent scene elements, missing significant parts of the scene, and inferring incorrect attributes of and relationships between objects. To address these issues, we introduce a novel framework, ViGoR(Visual Grounding Through Fine-Grained Reward Modeling) that utilizes fine-grained reward modeling to significantly enhance the visual grounding of LVLMs over pre-trained baselines. This improvement is efficiently achieved using much cheaper human evaluations instead of full supervisions, as well as automated methods. We show the effectiveness of our approach through a variety of evaluation methods and benchmarks. Additionally, we plan to release our human annotation comprising approximately 16,000 images and generated text pairs with fine-grained evaluations to contribute to related research in the community.
- [1181] arXiv:2402.06126 (cross-list from cs.CL) [ pdf , ps , other ]
-
Title: Learn To be Efficient: Build Structured Sparsity in Large Language ModelsSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: Large Language Models (LLMs) have achieved remarkable success with their billion-level parameters, yet they incur high inference overheads. The emergence of activation sparsity in LLMs provides a natural approach to reduce this cost by involving only parts of the parameters for inference. Existing methods only focus on utilizing this naturally formed activation sparsity, overlooking the potential for further amplifying this inherent sparsity. In this paper, we hypothesize that LLMs can learn to be efficient by achieving more structured activation sparsity. To achieve this, we introduce a novel algorithm, Learn-To-be-Efficient (LTE), designed to train efficiency-aware LLMs to learn to activate fewer neurons and achieve a better trade-off between sparsity and performance. Furthermore, unlike SOTA MoEfication methods, which mainly focus on ReLU-based models, LTE can also be applied to LLMs like GPT and LLaMA with soft activation functions. We evaluate LTE on four models and eleven datasets. The experiments show that LTE achieves a better trade-off between sparsity and task performance. For instance, LTE with LLaMA provides a 1.83x-2.59x FLOPs speed-up on language generation tasks, outperforming the state-of-the-art methods.
- [1182] arXiv:2402.06128 (cross-list from cs.LG) [ pdf , ps , html , other ]
-
Title: Rethinking Node-wise Propagation for Large-scale Graph LearningComments: Accepted by WWW 2024Subjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Social and Information Networks (cs.SI)
Abstract: Scalable graph neural networks (GNNs) have emerged as a promising technique, which exhibits superior predictive performance and high running efficiency across numerous large-scale graph-based web applications. However, (i) Most scalable GNNs tend to treat all nodes in graphs with the same propagation rules, neglecting their topological uniqueness; (ii) Existing node-wise propagation optimization strategies are insufficient on web-scale graphs with intricate topology, where a full portrayal of nodes' local properties is required. Intuitively, different nodes in web-scale graphs possess distinct topological roles, and therefore propagating them indiscriminately or neglect local contexts may compromise the quality of node representations. This intricate topology in web-scale graphs cannot be matched by small-scale scenarios. To address the above issues, we propose \textbf{A}daptive \textbf{T}opology-aware \textbf{P}ropagation (ATP), which reduces potential high-bias propagation and extracts structural patterns of each node in a scalable manner to improve running efficiency and predictive performance. Remarkably, ATP is crafted to be a plug-and-play node-wise propagation optimization strategy, allowing for offline execution independent of the graph learning process in a new perspective. Therefore, this approach can be seamlessly integrated into most scalable GNNs while remain orthogonal to existing node-wise propagation optimization strategies. Extensive experiments on 12 datasets, including the most representative large-scale ogbn-papers100M, have demonstrated the effectiveness of ATP. Specifically, ATP has proven to be efficient in improving the performance of prevalent scalable GNNs for semi-supervised node classification while addressing redundant computational costs.
- [1183] arXiv:2402.06158 (cross-list from cs.DS) [ pdf , ps , html , other ]
-
Title: Assortment Planning with Sponsored ProductsSubjects: Data Structures and Algorithms (cs.DS) ; Artificial Intelligence (cs.AI); Information Retrieval (cs.IR)
Abstract: In the rapidly evolving landscape of retail, assortment planning plays a crucial role in determining the success of a business. With the rise of sponsored products and their increasing prominence in online marketplaces, retailers face new challenges in effectively managing their product assortment in the presence of sponsored products. Remarkably, previous research in assortment planning largely overlooks the existence of sponsored products and their potential impact on overall recommendation effectiveness. Instead, they commonly make the simplifying assumption that all products are either organic or non-sponsored. This research gap underscores the necessity for a more thorough investigation of the assortment planning challenge when sponsored products are in play. We formulate the assortment planning problem in the presence of sponsored products as a combinatorial optimization task. The ultimate objective is to compute an assortment plan that optimizes expected revenue while considering the specific requirements of placing sponsored products strategically.
- [1184] arXiv:2402.06165 (cross-list from cs.CV) [ pdf , ps , html , other ]
-
Title: Learning Contrastive Feature Representations for Facial Action Unit DetectionComments: 11 pages, 3 figures, submitted to an IEEE journalSubjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: The predominant approach to facial action unit (AU) detection revolves around a supervised multi-label binary classification problem. Existing methodologies often encode pixel-level information of AUs, thereby imposing substantial demands on model complexity and expressiveness. Moreover, this practice elevates the susceptibility to overfitting due to the presence of noisy AU labels. In the present study, we introduce a contrastive learning framework enhanced by both supervised and self-supervised signals. The objective is to acquire discriminative features, deviating from the conventional pixel-level learning paradigm within the domain of AU detection. To address the challenge posed by noisy AU labels, we augment the supervised signal through the introduction of a self-supervised signal. This augmentation is achieved through positive sample sampling, encompassing three distinct types of positive sample pairs. Furthermore, to mitigate the imbalanced distribution of each AU type, we employ an importance re-weighting strategy tailored for minority AUs. The resulting loss, denoted as AUNCE, is proposed to encapsulate this strategy. Our experimental assessments, conducted on two widely-utilized benchmark datasets (BP4D and DISFA), underscore the superior performance of our approach compared to state-of-the-art methods in the realm of AU detection.
- [1185] arXiv:2402.06178 (cross-list from cs.SD) [ pdf , ps , html , other ]
-
Title: MusicMagus: Zero-Shot Text-to-Music Editing via Diffusion ModelsYixiao Zhang , Yukara Ikemiya , Gus Xia , Naoki Murata , Marco Martínez , Wei-Hsiang Liao , Yuki Mitsufuji , Simon DixonComments: Project page: this https URLSubjects: Sound (cs.SD) ; Artificial Intelligence (cs.AI); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)
Abstract: Recent advances in text-to-music generation models have opened new avenues in musical creativity. However, music generation usually involves iterative refinements, and how to edit the generated music remains a significant challenge. This paper introduces a novel approach to the editing of music generated by such models, enabling the modification of specific attributes, such as genre, mood and instrument, while maintaining other aspects unchanged. Our method transforms text editing to \textit{latent space manipulation} while adding an extra constraint to enforce consistency. It seamlessly integrates with existing pretrained text-to-music diffusion models without requiring additional training. Experimental results demonstrate superior performance over both zero-shot and certain supervised baselines in style and timbre transfer evaluations. Additionally, we showcase the practical applicability of our approach in real-world music editing scenarios.
- [1186] arXiv:2402.06185 (cross-list from cs.CV) [ pdf , ps , html , other ]
-
Title: Development and validation of an artificial intelligence model to accurately predict spinopelvic parametersEdward S. Harake , Joseph R. Linzey , Cheng Jiang , Rushikesh S. Joshi , Mark M. Zaki , Jaes C. Jones , Siri S. Khalsa , John H. Lee , Zachary Wilseck , Jacob R. Joseph , Todd C. Hollon , Paul ParkComments: 10 pages, 5 figures, to appear in Journal of Neurosurgery: SpineSubjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: Objective. Achieving appropriate spinopelvic alignment has been shown to be associated with improved clinical symptoms. However, measurement of spinopelvic radiographic parameters is time-intensive and interobserver reliability is a concern. Automated measurement tools have the promise of rapid and consistent measurements, but existing tools are still limited by some degree of manual user-entry requirements. This study presents a novel artificial intelligence (AI) tool called SpinePose that automatically predicts spinopelvic parameters with high accuracy without the need for manual entry.
Methods. SpinePose was trained and validated on 761 sagittal whole-spine X-rays to predict sagittal vertical axis (SVA), pelvic tilt (PT), pelvic incidence (PI), sacral slope (SS), lumbar lordosis (LL), T1-pelvic angle (T1PA), and L1-pelvic angle (L1PA). A separate test set of 40 X-rays was labeled by 4 reviewers, including fellowship-trained spine surgeons and a fellowship-trained radiologist with neuroradiology subspecialty certification. Median errors relative to the most senior reviewer were calculated to determine model accuracy on test images. Intraclass correlation coefficients (ICC) were used to assess inter-rater reliability.
Results. SpinePose exhibited the following median (interquartile range) parameter errors: SVA: 2.2(2.3)mm, p=0.93; PT: 1.3(1.2)°, p=0.48; SS: 1.7(2.2)°, p=0.64; PI: 2.2(2.1)°, p=0.24; LL: 2.6(4.0)°, p=0.89; T1PA: 1.1(0.9)°, p=0.42; and L1PA: 1.4(1.6)°, p=0.49. Model predictions also exhibited excellent reliability at all parameters (ICC: 0.91-1.0).
Conclusions. SpinePose accurately predicted spinopelvic parameters with excellent reliability comparable to fellowship-trained spine surgeons and neuroradiologists. Utilization of predictive AI tools in spinal imaging can substantially aid in patient selection and surgical planning. - [1187] arXiv:2402.06187 (cross-list from cs.LG) [ pdf , ps , html , other ]
-
Title: Premier-TACO is a Few-Shot Policy Learner: Pretraining Multitask Representation via Temporal Action-Driven Contrastive LossRuijie Zheng , Yongyuan Liang , Xiyao Wang , Shuang Ma , Hal Daumé III , Huazhe Xu , John Langford , Praveen Palanisamy , Kalyan Shankar Basu , Furong HuangSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Robotics (cs.RO)
Abstract: We present Premier-TACO, a multitask feature representation learning approach designed to improve few-shot policy learning efficiency in sequential decision-making tasks. Premier-TACO leverages a subset of multitask offline datasets for pretraining a general feature representation, which captures critical environmental dynamics and is fine-tuned using minimal expert demonstrations. It advances the temporal action contrastive learning (TACO) objective, known for state-of-the-art results in visual control tasks, by incorporating a novel negative example sampling strategy. This strategy is crucial in significantly boosting TACO's computational efficiency, making large-scale multitask offline pretraining feasible. Our extensive empirical evaluation in a diverse set of continuous control benchmarks including Deepmind Control Suite, MetaWorld, and LIBERO demonstrate Premier-TACO's effectiveness in pretraining visual representations, significantly enhancing few-shot imitation learning of novel tasks. Our code, pretraining data, as well as pretrained model checkpoints will be released at this https URL . Our project webpage is at this https URL .
- [1188] arXiv:2402.06188 (cross-list from cs.CV) [ pdf , ps , html , other ]
-
Title: A self-supervised framework for learning whole slide representationsXinhai Hou , Cheng Jiang , Akhil Kondepudi , Yiwei Lyu , Asadur Zaman Chowdury , Honglak Lee , Todd C. HollonComments: 18 pages, 11 figuresSubjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: Whole slide imaging is fundamental to biomedical microscopy and computational pathology. However, whole slide images (WSIs) present a complex computer vision challenge due to their gigapixel size, diverse histopathologic features, spatial heterogeneity, and limited/absent data annotations. These challenges highlight that supervised training alone can result in suboptimal whole slide representations. Self-supervised representation learning can achieve high-quality WSI visual feature learning for downstream diagnostic tasks, such as cancer diagnosis or molecular genetic prediction. Here, we present a general self-supervised whole slide learning (S3L) framework for gigapixel-scale self-supervision of WSIs. S3L combines data transformation strategies from transformer-based vision and language modeling into a single unified framework to generate paired views for self-supervision. S3L leverages the inherent regional heterogeneity, histologic feature variability, and information redundancy within WSIs to learn high-quality whole-slide representations. We benchmark S3L visual representations on two diagnostic tasks for two biomedical microscopy modalities. S3L significantly outperforms WSI baselines for cancer diagnosis and genetic mutation prediction. Additionally, S3L achieves good performance using both in-domain and out-of-distribution patch encoders, demonstrating good flexibility and generalizability.
- [1189] arXiv:2402.06196 (cross-list from cs.CL) [ pdf , ps , html , other ]
-
Title: Large Language Models: A SurveyShervin Minaee , Tomas Mikolov , Narjes Nikzad , Meysam Chenaghlu , Richard Socher , Xavier Amatriain , Jianfeng GaoComments: arXiv admin note: substantial text overlap with arXiv:2401.14423Subjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI)
Abstract: Large Language Models (LLMs) have drawn a lot of attention due to their strong performance on a wide range of natural language tasks, since the release of ChatGPT in November 2022. LLMs' ability of general-purpose language understanding and generation is acquired by training billions of model's parameters on massive amounts of text data, as predicted by scaling laws \cite{kaplan2020scaling,hoffmann2022training}. The research area of LLMs, while very recent, is evolving rapidly in many different ways. In this paper, we review some of the most prominent LLMs, including three popular LLM families (GPT, LLaMA, PaLM), and discuss their characteristics, contributions and limitations. We also give an overview of techniques developed to build, and augment LLMs. We then survey popular datasets prepared for LLM training, fine-tuning, and evaluation, review widely used LLM evaluation metrics, and compare the performance of several popular LLMs on a set of representative benchmarks. Finally, we conclude the paper by discussing open challenges and future research directions.
- [1190] arXiv:2402.06204 (cross-list from cs.CL) [ pdf , ps , html , other ]
-
Title: The Generative AI Paradox on Evaluation: What It Can Solve, It May Not EvaluateSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI)
Abstract: This paper explores the assumption that Large Language Models (LLMs) skilled in generation tasks are equally adept as evaluators. We assess the performance of three LLMs and one open-source LM in Question-Answering (QA) and evaluation tasks using the TriviaQA (Joshi et al., 2017) dataset. Results indicate a significant disparity, with LLMs exhibiting lower performance in evaluation tasks compared to generation tasks. Intriguingly, we discover instances of unfaithful evaluation where models accurately evaluate answers in areas where they lack competence, underscoring the need to examine the faithfulness and trustworthiness of LLMs as evaluators. This study contributes to the understanding of "the Generative AI Paradox" (West et al., 2023), highlighting a need to explore the correlation between generative excellence and evaluation proficiency, and the necessity to scrutinize the faithfulness aspect in model evaluations.
- [1191] arXiv:2402.06229 (cross-list from cs.HC) [ pdf , ps , html , other ]
-
Title: Exploring Interaction Patterns for Debugging: Enhancing Conversational Capabilities of AI-assistantsBhavya Chopra , Yasharth Bajpai , Param Biyani , Gustavo Soares , Arjun Radhakrishna , Chris Parnin , Sumit GulwaniComments: 7 pages, 4 figures, 2 tablesSubjects: Human-Computer Interaction (cs.HC) ; Artificial Intelligence (cs.AI); Software Engineering (cs.SE)
Abstract: The widespread availability of Large Language Models (LLMs) within Integrated Development Environments (IDEs) has led to their speedy adoption. Conversational interactions with LLMs enable programmers to obtain natural language explanations for various software development tasks. However, LLMs often leap to action without sufficient context, giving rise to implicit assumptions and inaccurate responses. Conversations between developers and LLMs are primarily structured as question-answer pairs, where the developer is responsible for asking the the right questions and sustaining conversations across multiple turns. In this paper, we draw inspiration from interaction patterns and conversation analysis -- to design Robin, an enhanced conversational AI-assistant for debugging. Through a within-subjects user study with 12 industry professionals, we find that equipping the LLM to -- (1) leverage the insert expansion interaction pattern, (2) facilitate turn-taking, and (3) utilize debugging workflows -- leads to lowered conversation barriers, effective fault localization, and 5x improvement in bug resolution rates.
- [1192] arXiv:2402.06255 (cross-list from cs.LG) [ pdf , ps , other ]
-
Title: Studious Bob Fight Back Against Jailbreaking via Prompt Adversarial TuningSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Cryptography and Security (cs.CR)
Abstract: Although Large Language Models (LLMs) have achieved tremendous success in various applications, they are also susceptible to certain prompts that can induce them to bypass built-in safety measures and provide dangerous or illegal content, a phenomenon known as jailbreak. To protect LLMs from producing harmful information, various defense strategies are proposed, with most focusing on content filtering or adversarial training of models. In this paper, we propose an approach named Prompt Adversarial Tuning (PAT) to train a defense control mechanism, which is then embedded as a prefix to user prompts to implement our defense strategy. We design a training process similar to adversarial training to achieve our optimized goal, alternating between updating attack and defense controls. To our knowledge, we are the first to implement defense from the perspective of prompt tuning. Once employed, our method will hardly impact the operational efficiency of LLMs. Experiments show that our method is effective in both black-box and white-box settings, reducing the success rate of advanced attacks to nearly 0 while maintaining the benign answer rate of 80% to simple benign questions. Our work might potentially chart a new perspective for future explorations in LLM security.
- [1193] arXiv:2402.06262 (cross-list from cs.CL) [ pdf , ps , html , other ]
-
Title: On the Efficacy of Eviction Policy for Key-Value Constrained Generative Language Model InferenceSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI)
Abstract: Despite the recent success associated with Large Language Models (LLMs), they are notably cost-prohibitive to deploy in resource-constrained environments due to their excessive memory and computational demands. In addition to model parameters, the key-value cache is also stored in GPU memory, growing linearly with batch size and sequence length. As a remedy, recent works have proposed various eviction policies for maintaining the overhead of key-value cache under a given budget. This paper embarks on the efficacy of existing eviction policies in terms of importance score calculation and eviction scope construction. We identify the deficiency of prior policies in these two aspects and introduce RoCo, a robust cache omission policy based on temporal attention scores and robustness measures. Extensive experimentation spanning prefilling and auto-regressive decoding stages validates the superiority of RoCo. Finally, we release EasyKV, a versatile software package dedicated to user-friendly key-value constrained generative inference. Code available at this https URL .
- [1194] arXiv:2402.06287 (cross-list from cs.LG) [ pdf , ps , other ]
-
Title: AI, Meet Human: Learning Paradigms for Hybrid Decision Making SystemsSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Human-Computer Interaction (cs.HC)
Abstract: Everyday we increasingly rely on machine learning models to automate and support high-stake tasks and decisions. This growing presence means that humans are now constantly interacting with machine learning-based systems, training and using models everyday. Several different techniques in computer science literature account for the human interaction with machine learning systems, but their classification is sparse and the goals varied. This survey proposes a taxonomy of Hybrid Decision Making Systems, providing both a conceptual and technical framework for understanding how current computer science literature models interaction between humans and machines.
- [1195] arXiv:2402.06299 (cross-list from cs.NE) [ pdf , ps , html , other ]
-
Title: A Functional Analysis Approach to Symbolic RegressionComments: 14 pages, 3 figures. Submitted to Genetic and Evolutionary Computation Conference (GECCO-2024)Subjects: Neural and Evolutionary Computing (cs.NE) ; Artificial Intelligence (cs.AI)
Abstract: Symbolic regression (SR) poses a significant challenge for randomized search heuristics due to its reliance on the synthesis of expressions for input-output mappings. Although traditional genetic programming (GP) algorithms have achieved success in various domains, they exhibit limited performance when tree-based representations are used for SR. To address these limitations, we introduce a novel SR approach called Fourier Tree Growing (FTG) that draws insights from functional analysis. This new perspective enables us to perform optimization directly in a different space, thus avoiding intricate symbolic expressions. Our proposed algorithm exhibits significant performance improvements over traditional GP methods on a range of classical one-dimensional benchmarking problems. To identify and explain limiting factors of GP and FTG, we perform experiments on a large-scale polynomials benchmark with high-order polynomials up to degree 100. To the best of the authors' knowledge, this work represents the pioneering application of functional analysis in addressing SR problems. The superior performance of the proposed algorithm and insights into the limitations of GP open the way for further advancing GP for SR and related areas of explainable machine learning.
- [1196] arXiv:2402.06304 (cross-list from cs.SD) [ pdf , ps , other ]
-
Title: A New Approach to Voice AuthenticityNicolas M. Müller , Piotr Kawa , Shen Hu , Matthias Neu , Jennifer Williams , Philip Sperl , Konstantin BöttingerSubjects: Sound (cs.SD) ; Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
Abstract: Voice faking, driven primarily by recent advances in text-to-speech (TTS) synthesis technology, poses significant societal challenges. Currently, the prevailing assumption is that unaltered human speech can be considered genuine, while fake speech comes from TTS synthesis. We argue that this binary distinction is oversimplified. For instance, altered playback speeds can be used for malicious purposes, like in the 'Drunken Nancy Pelosi' incident. Similarly, editing of audio clips can be done ethically, e.g., for brevity or summarization in news reporting or podcasts, but editing can also create misleading narratives. In this paper, we propose a conceptual shift away from the binary paradigm of audio being either 'fake' or 'real'. Instead, our focus is on pinpointing 'voice edits', which encompass traditional modifications like filters and cuts, as well as TTS synthesis and VC systems. We delineate 6 categories and curate a new challenge dataset rooted in the M-AILABS corpus, for which we present baseline detection systems. And most importantly, we argue that merely categorizing audio as fake or real is a dangerous over-simplification that will fail to move the field of speech technology forward.
- [1197] arXiv:2402.06334 (cross-list from cs.IR) [ pdf , ps , other ]
-
Title: ExaRanker-Open: Synthetic Explanation for IR using Open-Source LLMsSubjects: Information Retrieval (cs.IR) ; Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
Abstract: ExaRanker recently introduced an approach to training information retrieval (IR) models, incorporating natural language explanations as additional labels. The method addresses the challenge of limited labeled examples, leading to improvements in the effectiveness of IR models. However, the initial results were based on proprietary language models such as GPT-3.5, which posed constraints on dataset size due to its cost and data privacy. In this paper, we introduce ExaRanker-Open, where we adapt and explore the use of open-source language models to generate explanations. The method has been tested using different LLMs and datasets sizes to better comprehend the effective contribution of data augmentation. Our findings reveal that incorporating explanations consistently enhances neural rankers, with benefits escalating as the LLM size increases. Notably, the data augmentation method proves advantageous even with large datasets, as evidenced by ExaRanker surpassing the target baseline by 0.6 nDCG@10 points in our study. To encourage further advancements by the research community, we have open-sourced both the code and datasets at this https URL .
- [1198] arXiv:2402.06360 (cross-list from cs.IR) [ pdf , ps , other ]
-
Title: CoSearchAgent: A Lightweight Collaborative Search Agent with Large Language ModelsComments: 4 pages, demoSubjects: Information Retrieval (cs.IR) ; Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
Abstract: Collaborative search supports multiple users working together to accomplish a specific search task. Research has found that designing lightweight collaborative search plugins within instant messaging platforms aligns better with users' collaborative habits. However, due to the complexity of multi-user interaction scenarios, it is challenging to implement a fully functioning lightweight collaborative search system. Therefore, previous studies on lightweight collaborative search had to rely on the Wizard of Oz paradigm. In recent years, large language models (LLMs) have been demonstrated to interact naturally with users and achieve complex information-seeking tasks through LLM-based agents. Hence, to better support the research in collaborative search, in this demo, we propose CoSearchAgent, a lightweight collaborative search agent powered by LLMs. CoSearchAgent is designed as a Slack plugin that can support collaborative search during multi-party conversations on this platform. Equipped with the capacity to understand the queries and context in multi-user conversations and the ability to search the Web for relevant information via APIs, CoSearchAgent can respond to user queries with answers grounded on the relevant search results. It can also ask clarifying questions when the information needs are unclear. The proposed CoSearchAgent is highly flexible and would be useful for supporting further research on collaborative search. The code and demo video are accessible.
- [1199] arXiv:2402.06377 (cross-list from cs.LG) [ pdf , ps , other ]
-
Title: High-Precision Geosteering via Reinforcement Learning and Particle FiltersRessi Bonti Muhammad , Apoorv Srivastava , Sergey Alyaev , Reidar Brumer Bratvold , Daniel M. TartakovskyComments: 40 pagesSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Geophysics (physics.geo-ph)
Abstract: Geosteering, a key component of drilling operations, traditionally involves manual interpretation of various data sources such as well-log data. This introduces subjective biases and inconsistent procedures. Academic attempts to solve geosteering decision optimization with greedy optimization and Approximate Dynamic Programming (ADP) showed promise but lacked adaptivity to realistic diverse scenarios. Reinforcement learning (RL) offers a solution to these challenges, facilitating optimal decision-making through reward-based iterative learning. State estimation methods, e.g., particle filter (PF), provide a complementary strategy for geosteering decision-making based on online information. We integrate an RL-based geosteering with PF to address realistic geosteering scenarios. Our framework deploys PF to process real-time well-log data to estimate the location of the well relative to the stratigraphic layers, which then informs the RL-based decision-making process. We compare our method's performance with that of using solely either RL or PF. Our findings indicate a synergy between RL and PF in yielding optimized geosteering decisions.
- [1200] arXiv:2402.06388 (cross-list from stat.ML) [ pdf , ps , other ]
-
Title: On the Convergence Rate of the Stochastic Gradient Descent (SGD) and application to a modified policy gradient for the Multi Armed BanditSubjects: Machine Learning (stat.ML) ; Artificial Intelligence (cs.AI); Data Structures and Algorithms (cs.DS); Machine Learning (cs.LG); Numerical Analysis (math.NA)
Abstract: We present a self-contained proof of the convergence rate of the Stochastic Gradient Descent (SGD) when the learning rate follows an inverse time decays schedule; we next apply the results to the convergence of a modified form of policy gradient Multi-Armed Bandit (MAB) with $L2$ regularization.
- [1201] arXiv:2402.06397 (cross-list from cs.CC) [ pdf , ps , other ]
-
Title: Finding hardness reductions automatically using SAT solversSubjects: Computational Complexity (cs.CC) ; Artificial Intelligence (cs.AI); Data Structures and Algorithms (cs.DS); Logic in Computer Science (cs.LO); Combinatorics (math.CO)
Abstract: In this article, we show that the completion problem, i.e. the decision problem whether a partial structure can be completed to a full structure, is NP-complete for many combinatorial structures. While the gadgets for most reductions in literature are found by hand, we present an algorithm to construct gadgets in a fully automated way. Using our framework which is based on SAT, we present the first thorough study of the completion problem on sign mappings with forbidden substructures by classifying thousands of structures for which the completion problem is NP-complete. Our list in particular includes interior triple systems, which were introduced by Knuth towards an axiomatization of planar point configurations. Last but not least, we give an infinite family of structures generalizing interior triple system to higher dimensions for which the completion problem is NP-complete.
- [1202] arXiv:2402.06402 (cross-list from cs.LG) [ pdf , ps , other ]
-
Title: Hierarchical Transformers are Efficient Meta-Reinforcement LearnersSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Abstract: We introduce Hierarchical Transformers for Meta-Reinforcement Learning (HTrMRL), a powerful online meta-reinforcement learning approach. HTrMRL aims to address the challenge of enabling reinforcement learning agents to perform effectively in previously unseen tasks. We demonstrate how past episodes serve as a rich source of information, which our model effectively distills and applies to new contexts. Our learned algorithm is capable of outperforming the previous state-of-the-art and provides more efficient meta-training while significantly improving generalization capabilities. Experimental results, obtained across various simulated tasks of the Meta-World Benchmark, indicate a significant improvement in learning efficiency and adaptability compared to the state-of-the-art on a variety of tasks. Our approach not only enhances the agent's ability to generalize from limited data but also paves the way for more robust and versatile AI systems.
- [1203] arXiv:2402.06457 (cross-list from cs.LG) [ pdf , ps , other ]
-
Title: V-STaR: Training Verifiers for Self-Taught ReasonersArian Hosseini , Xingdi Yuan , Nikolay Malkin , Aaron Courville , Alessandro Sordoni , Rishabh AgarwalSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
Abstract: Common self-improvement approaches for large language models (LLMs), such as STaR (Zelikman et al., 2022), iteratively fine-tune LLMs on self-generated solutions to improve their problem-solving ability. However, these approaches discard the large amounts of incorrect solutions generated during this process, potentially neglecting valuable information in such solutions. To address this shortcoming, we propose V-STaR that utilizes both the correct and incorrect solutions generated during the self-improvement process to train a verifier using DPO that judges correctness of model-generated solutions. This verifier is used at inference time to select one solution among many candidate solutions. Running V-STaR for multiple iterations results in progressively better reasoners and verifiers, delivering a 4% to 17% test accuracy improvement over existing self-improvement and verification approaches on common code generation and math reasoning benchmarks with LLaMA2 models.
- [1204] arXiv:2402.06472 (cross-list from cs.HC) [ pdf , ps , other ]
-
Title: "When He Feels Cold, He Goes to the Seahorse"-Blending Generative AI into Multimaterial Storymaking for Family Expressive Arts TherapyComments: to appear at ACM CHI '24Subjects: Human-Computer Interaction (cs.HC) ; Artificial Intelligence (cs.AI)
Abstract: Storymaking, as an integrative form of expressive arts therapy, is an effective means to foster family communication. Yet, the integration of generative AI as expressive materials in therapeutic storymaking remains underexplored. And there is a lack of HCI implications on how to support families and therapists in this context. Addressing this, our study involved five weeks of storymaking sessions with seven families guided by a professional therapist. In these sessions, the families used both traditional art-making materials and image-based generative AI to create and evolve their family stories. Via the rich empirical data and commentaries from four expert therapists, we contextualize how families creatively melded AI and traditional expressive materials to externalize their ideas and feelings. Through the lens of Expressive Therapies Continuum (ETC), we characterize the therapeutic implications of AI as expressive materials. Desirable interaction qualities to support children, parents, and therapists are distilled for future HCI research.
- [1205] arXiv:2402.06492 (cross-list from cs.CL) [ pdf , ps , other ]
-
Title: Inducing Systematicity in Transformers by Attending to Structurally Quantized EmbeddingsComments: 22 pages, code: this https URLSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: Transformers generalize to novel compositions of structures and entities after being trained on a complex dataset, but easily overfit on datasets of insufficient complexity. We observe that when the training set is sufficiently complex, the model encodes sentences that have a common syntactic structure using a systematic attention pattern. Inspired by this observation, we propose SQ-Transformer (Structurally Quantized) that explicitly encourages systematicity in the embeddings and attention layers, even with a training set of low complexity. At the embedding level, we introduce Structure-oriented Vector Quantization (SoVQ) to cluster word embeddings into several classes of structurally equivalent entities. At the attention level, we devise the Systematic Attention Layer (SAL) and an alternative, Systematically Regularized Layer (SRL) that operate on the quantized word embeddings so that sentences of the same structure are encoded with invariant or similar attention patterns. Empirically, we show that SQ-Transformer achieves stronger compositional generalization than the vanilla Transformer on multiple low-complexity semantic parsing and machine translation datasets. In our analysis, we show that SoVQ indeed learns a syntactically clustered embedding space and SAL/SRL induces generalizable attention patterns, which lead to improved systematicity.
- [1206] arXiv:2402.06501 (cross-list from cs.LG) [ pdf , ps , html , other ]
-
Title: Scalable Interactive Machine Learning for Future Command and ControlAnna Madison , Ellen Novoseller , Vinicius G. Goecks , Benjamin T. Files , Nicholas Waytowich , Alfred Yu , Vernon J. Lawhern , Steven Thurman , Christopher Kelshaw , Kaleb McDowellComments: Accepted at the NATO Science and Technology Organization Symposium (ICMCIS) organized by the Information Systems Technology (IST) Panel, IST-205-RSY - the ICMCIS, held in Koblenz, Germany, 23-24 April 2024Subjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Human-Computer Interaction (cs.HC)
Abstract: Future warfare will require Command and Control (C2) personnel to make decisions at shrinking timescales in complex and potentially ill-defined situations. Given the need for robust decision-making processes and decision-support tools, integration of artificial and human intelligence holds the potential to revolutionize the C2 operations process to ensure adaptability and efficiency in rapidly changing operational environments. We propose to leverage recent promising breakthroughs in interactive machine learning, in which humans can cooperate with machine learning algorithms to guide machine learning algorithm behavior. This paper identifies several gaps in state-of-the-art science and technology that future work should address to extend these approaches to function in complex C2 contexts. In particular, we describe three research focus areas that together, aim to enable scalable interactive machine learning (SIML): 1) developing human-AI interaction algorithms to enable planning in complex, dynamic situations; 2) fostering resilient human-AI teams through optimizing roles, configurations, and trust; and 3) scaling algorithms and human-AI teams for flexibility across a range of potential contexts and situations.
- [1207] arXiv:2402.06506 (cross-list from cs.CV) [ pdf , ps , other ]
-
Title: Classifying point clouds at the facade-level using geometric features and deep learning networksComments: Accepted to the Recent Advances in 3D Geoinformation Science, Proceedings of the 18th 3D GeoInfo Conference 2023Subjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: 3D building models with facade details are playing an important role in many applications now. Classifying point clouds at facade-level is key to create such digital replicas of the real world. However, few studies have focused on such detailed classification with deep neural networks. We propose a method fusing geometric features with deep learning networks for point cloud classification at facade-level. Our experiments conclude that such early-fused features improve deep learning methods' performance. This method can be applied for compensating deep learning networks' ability in capturing local geometric information and promoting the advancement of semantic segmentation.
- [1208] arXiv:2402.06509 (cross-list from cs.CL) [ pdf , ps , other ]
-
Title: Asking the Right Question at the Right Time: Human and Model Uncertainty Guidance to Ask Clarification QuestionsComments: Accepted at EACL 2024Subjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI)
Abstract: Clarification questions are an essential dialogue tool to signal misunderstanding, ambiguities, and under-specification in language use. While humans are able to resolve uncertainty by asking questions since childhood, modern dialogue systems struggle to generate effective questions. To make progress in this direction, in this work we take a collaborative dialogue task as a testbed and study how model uncertainty relates to human uncertainty -- an as yet under-explored problem. We show that model uncertainty does not mirror human clarification-seeking behavior, which suggests that using human clarification questions as supervision for deciding when to ask may not be the most effective way to resolve model uncertainty. To address this issue, we propose an approach to generating clarification questions based on model uncertainty estimation, compare it to several alternatives, and show that it leads to significant improvements in terms of task success. Our findings highlight the importance of equipping dialogue systems with the ability to assess their own uncertainty and exploit in interaction.
- [1209] arXiv:2402.06530 (cross-list from cs.LG) [ pdf , ps , html , other ]
-
Title: Refining Myocardial Infarction Detection: A Novel Multi-Modal Composite Kernel Strategy in One-Class ClassificationMuhammad Uzair Zahid , Aysen Degerli , Fahad Sohrab , Serkan Kiranyaz , Tahir Hamid , Rashid Mazhar , Moncef GabboujSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Signal Processing (eess.SP)
Abstract: Early detection of myocardial infarction (MI), a critical condition arising from coronary artery disease (CAD), is vital to prevent further myocardial damage. This study introduces a novel method for early MI detection using a one-class classification (OCC) algorithm in echocardiography. Our study overcomes the challenge of limited echocardiography data availability by adopting a novel approach based on Multi-modal Subspace Support Vector Data Description. The proposed technique involves a specialized MI detection framework employing multi-view echocardiography incorporating a composite kernel in the non-linear projection trick, fusing Gaussian and Laplacian sigmoid functions. Additionally, we enhance the update strategy of the projection matrices by adapting maximization for both or one of the modalities in the optimization process. Our method boosts MI detection capability by efficiently transforming features extracted from echocardiography data into an optimized lower-dimensional subspace. The OCC model trained specifically on target class instances from the comprehensive HMC-QU dataset that includes multiple echocardiography views indicates a marked improvement in MI detection accuracy. Our findings reveal that our proposed multi-view approach achieves a geometric mean of 71.24%, signifying a substantial advancement in echocardiography-based MI diagnosis and offering more precise and efficient diagnostic tools.
- [1210] arXiv:2402.06532 (cross-list from cs.LG) [ pdf , ps , html , other ]
-
Title: Generative Adversarial Bayesian Optimization for Surrogate ObjectivesComments: 15 pages, 3 figuresSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Abstract: Offline model-based policy optimization seeks to optimize a learned surrogate objective function without querying the true oracle objective during optimization. However, inaccurate surrogate model predictions are frequently encountered along the optimization trajectory. To address this limitation, we propose generative adversarial Bayesian optimization (GABO) using adaptive source critic regularization, a task-agnostic framework for Bayesian optimization that employs a Lipschitz-bounded source critic model to constrain the optimization trajectory to regions where the surrogate function is reliable. We show that under certain assumptions for the continuous input space prior, our algorithm dynamically adjusts the strength of the source critic regularization. GABO outperforms existing baselines on a number of different offline optimization tasks across a variety of scientific domains. Our code is available at this https URL
- [1211] arXiv:2402.06544 (cross-list from cs.CL) [ pdf , ps , html , other ]
-
Title: Calibrating Long-form Generations from Large Language ModelsSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: To enhance Large Language Models' (LLMs) reliability, calibration is essential -- the model's assessed confidence scores should align with the actual likelihood of its responses being correct. However, current confidence elicitation methods and calibration metrics typically rely on a binary true/false assessment of response correctness. This approach does not apply to long-form generation, where an answer can be partially correct. Addressing this gap, we introduce a unified calibration framework, in which both the correctness of the LLMs' responses and their associated confidence levels are treated as distributions across a range of scores. Within this framework, we develop three metrics to precisely evaluate LLM calibration and further propose two confidence elicitation methods based on self-consistency and self-evaluation. Our experiments, which include long-form QA and summarization tasks, demonstrate that larger models don't necessarily guarantee better calibration, that calibration performance is found to be metric-dependent, and that self-consistency methods excel in factoid datasets. We also find that calibration can be enhanced through techniques such as fine-tuning, integrating relevant source documents, scaling the temperature, and combining self-consistency with self-evaluation. Lastly, we showcase a practical application of our system: selecting and cascading open-source models and ChatGPT to optimize correctness given a limited API budget. This research not only challenges existing notions of LLM calibration but also offers practical methodologies for improving trustworthiness in long-form generation.
- [1212] arXiv:2402.06549 (cross-list from cs.CL) [ pdf , ps , other ]
-
Title: Bryndza at ClimateActivism 2024: Stance, Target and Hate Event Detection via Retrieval-Augmented GPT-4 and LLaMAComments: Accepted to the 7th Workshop on Challenges and Applications of Automated Extraction of Socio-political Events from Text (CASE 2024)Subjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI)
Abstract: This study details our approach for the CASE 2024 Shared Task on Climate Activism Stance and Hate Event Detection, focusing on Hate Speech Detection, Hate Speech Target Identification, and Stance Detection as classification challenges. We explored the capability of Large Language Models (LLMs), particularly GPT-4, in zero- or few-shot settings enhanced by retrieval augmentation and re-ranking for Tweet classification. Our goal was to determine if LLMs could match or surpass traditional methods in this context.
We conducted an ablation study with LLaMA for comparison, and our results indicate that our models significantly outperformed the baselines, securing second place in the Target Detection task. The code for our submission is available at this https URL - [1213] arXiv:2402.06559 (cross-list from cs.LG) [ pdf , ps , other ]
-
Title: Diffusion-ES: Gradient-free Planning with Diffusion for Autonomous Driving and Zero-Shot Instruction FollowingBrian Yang , Huangyuan Su , Nikolaos Gkanatsios , Tsung-Wei Ke , Ayush Jain , Jeff Schneider , Katerina FragkiadakiSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Robotics (cs.RO)
Abstract: Diffusion models excel at modeling complex and multimodal trajectory distributions for decision-making and control. Reward-gradient guided denoising has been recently proposed to generate trajectories that maximize both a differentiable reward function and the likelihood under the data distribution captured by a diffusion model. Reward-gradient guided denoising requires a differentiable reward function fitted to both clean and noised samples, limiting its applicability as a general trajectory optimizer. In this paper, we propose DiffusionES, a method that combines gradient-free optimization with trajectory denoising to optimize black-box non-differentiable objectives while staying in the data manifold. Diffusion-ES samples trajectories during evolutionary search from a diffusion model and scores them using a black-box reward function. It mutates high-scoring trajectories using a truncated diffusion process that applies a small number of noising and denoising steps, allowing for much more efficient exploration of the solution space. We show that DiffusionES achieves state-of-the-art performance on nuPlan, an established closed-loop planning benchmark for autonomous driving. Diffusion-ES outperforms existing sampling-based planners, reactive deterministic or diffusion-based policies, and reward-gradient guidance. Additionally, we show that unlike prior guidance methods, our method can optimize non-differentiable language-shaped reward functions generated by few-shot LLM prompting. When guided by a human teacher that issues instructions to follow, our method can generate novel, highly complex behaviors, such as aggressive lane weaving, which are not present in the training data. This allows us to solve the hardest nuPlan scenarios which are beyond the capabilities of existing trajectory optimization methods and driving policies.
- [1214] arXiv:2402.06563 (cross-list from cs.LG) [ pdf , ps , other ]
-
Title: What is Hiding in Medicine's Dark Matter? Learning with Missing Data in Medical PracticesNeslihan Suzen , Evgeny M. Mirkes , Damian Roland , Jeremy Levesley , Alexander N. Gorban , Tim J. CoatsComments: 8 pagesJournal-ref: 2023 IEEE International Conference on Big Data (BigData), 4979-4986Subjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Human-Computer Interaction (cs.HC); Information Theory (cs.IT)
Abstract: Electronic patient records (EPRs) produce a wealth of data but contain significant missing information. Understanding and handling this missing data is an important part of clinical data analysis and if left unaddressed could result in bias in analysis and distortion in critical conclusions. Missing data may be linked to health care professional practice patterns and imputation of missing data can increase the validity of clinical decisions. This study focuses on statistical approaches for understanding and interpreting the missing data and machine learning based clinical data imputation using a single centre's paediatric emergency data and the data from UK's largest clinical audit for traumatic injury database (TARN). In the study of 56,961 data points related to initial vital signs and observations taken on children presenting to an Emergency Department, we have shown that missing data are likely to be non-random and how these are linked to health care professional practice patterns. We have then examined 79 TARN fields with missing values for 5,791 trauma cases. Singular Value Decomposition (SVD) and k-Nearest Neighbour (kNN) based missing data imputation methods are used and imputation results against the original dataset are compared and statistically tested. We have concluded that the 1NN imputer is the best imputation which indicates a usual pattern of clinical decision making: find the most similar patients and take their attributes as imputation.
- [1215] arXiv:2402.06584 (cross-list from cs.CL) [ pdf , ps , other ]
-
Title: G-SciEdBERT: A Contextualized LLM for Science Assessment Tasks in GermanComments: First German Science Education LLM, Submitted to AIED2024Subjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI)
Abstract: The advancement of natural language processing has paved the way for automated scoring systems in various languages, such as German (e.g., German BERT [G-BERT]). Automatically scoring written responses to science questions in German is a complex task and challenging for standard G-BERT as they lack contextual knowledge in the science domain and may be unaligned with student writing styles. This paper developed a contextualized German Science Education BERT (G-SciEdBERT), an innovative large language model tailored for scoring German-written responses to science tasks. Using G-BERT, we pre-trained G-SciEdBERT on a corpus of 50K German written science responses with 5M tokens to the Programme for International Student Assessment (PISA) 2015. We fine-tuned G-SciEdBERT on 59 assessment items and examined the scoring accuracy. We then compared its performance with G-BERT. Our findings reveal a substantial improvement in scoring accuracy with G-SciEdBERT, demonstrating a 10% increase of quadratic weighted kappa compared to G-BERT (mean accuracy difference = 0.096, SD = 0.024). These insights underline the significance of specialized language models like G-SciEdBERT, which is trained to enhance the accuracy of automated scoring, offering a substantial contribution to the field of AI in education.
- [1216] arXiv:2402.06599 (cross-list from cs.CV) [ pdf , ps , html , other ]
-
Title: On the Out-Of-Distribution Generalization of Multimodal Large Language ModelsXingxuan Zhang , Jiansheng Li , Wenjing Chu , Junjia Hai , Renzhe Xu , Yuqing Yang , Shikai Guan , Jiazheng Xu , Peng CuiSubjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI)
Abstract: We investigate the generalization boundaries of current Multimodal Large Language Models (MLLMs) via comprehensive evaluation under out-of-distribution scenarios and domain-specific tasks. We evaluate their zero-shot generalization across synthetic images, real-world distributional shifts, and specialized datasets like medical and molecular imagery. Empirical results indicate that MLLMs struggle with generalization beyond common training domains, limiting their direct application without adaptation. To understand the cause of unreliable performance, we analyze three hypotheses: semantic misinterpretation, visual feature extraction insufficiency, and mapping deficiency. Results identify mapping deficiency as the primary hurdle. To address this problem, we show that in-context learning (ICL) can significantly enhance MLLMs' generalization, opening new avenues for overcoming generalization barriers. We further explore the robustness of ICL under distribution shifts and show its vulnerability to domain shifts, label shifts, and spurious correlation shifts between in-context examples and test data.
- [1217] arXiv:2402.06606 (cross-list from cs.LG) [ pdf , ps , other ]
-
Title: RQP-SGD: Differential Private Machine Learning through Noisy SGD and Randomized QuantizationComments: This work is accepted by the 5th AAAI Workshop on Privacy-Preserving Artificial IntelligenceSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Cryptography and Security (cs.CR)
Abstract: The rise of IoT devices has prompted the demand for deploying machine learning at-the-edge with real-time, efficient, and secure data processing. In this context, implementing machine learning (ML) models with real-valued weight parameters can prove to be impractical particularly for large models, and there is a need to train models with quantized discrete weights. At the same time, these low-dimensional models also need to preserve privacy of the underlying dataset. In this work, we present RQP-SGD, a new approach for privacy-preserving quantization to train machine learning models for low-memory ML-at-the-edge. This approach combines differentially private stochastic gradient descent (DP-SGD) with randomized quantization, providing a measurable privacy guarantee in machine learning. In particular, we study the utility convergence of implementing RQP-SGD on ML tasks with convex objectives and quantization constraints and demonstrate its efficacy over deterministic quantization. Through experiments conducted on two datasets, we show the practical effectiveness of RQP-SGD.
- [1218] arXiv:2402.06608 (cross-list from cs.CL) [ pdf , ps , other ]
-
Title: TIC: Translate-Infer-Compile for accurate 'text to plan' using LLMs and logical intermediate representationsComments: 20 pages (7 main + 2 references + 11 appendix), 4 figures, 2 tablesSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI)
Abstract: We study the problem of generating plans for given natural language planning task requests. On one hand, LLMs excel at natural language processing but do not perform well on planning. On the other hand, classical planning tools excel at planning tasks but require input in a structured language such as the Planning Domain Definition Language (PDDL). We leverage the strengths of both the techniques by using an LLM for generating the PDDL representation (task PDDL) of planning task requests followed by using a classical planner for computing a plan. Unlike previous approaches that use LLMs for generating task PDDLs directly, our approach comprises of (a) translate: using an LLM only for generating a logically interpretable intermediate representation of natural language task descriptions, (b) infer: deriving additional logically dependent information from the intermediate representation using a logic reasoner (currently, Answer Set Programming solver), and (c) compile: generating the target task PDDL from the base and inferred information. We observe that using an LLM to only output the intermediate representation significantly reduces LLM errors. Consequently, TIC approach achieves, for at least one LLM, high accuracy on task PDDL generation for all seven domains of our evaluation dataset.
- [1219] arXiv:2402.06619 (cross-list from cs.CL) [ pdf , ps , other ]
-
Title: Aya Dataset: An Open-Access Collection for Multilingual Instruction TuningShivalika Singh , Freddie Vargus , Daniel Dsouza , Börje F. Karlsson , Abinaya Mahendiran , Wei-Yin Ko , Herumb Shandilya , Jay Patel , Deividas Mataciunas , Laura OMahony , Mike Zhang , Ramith Hettiarachchi , Joseph Wilson , Marina Machado , Luisa Souza Moura , Dominik Krzemiński , Hakimeh Fadaei , Irem Ergün , Ifeoma Okoh , Aisha Alaagib , Oshan Mudannayake , Zaid Alyafeai , Vu Minh Chien , Sebastian Ruder , Surya Guthikonda , Emad A. Alghamdi , Sebastian Gehrmann , Niklas Muennighoff , Max Bartolo , Julia Kreutzer , Ahmet Üstün , Marzieh Fadaee , Sara HookerSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI)
Abstract: Datasets are foundational to many breakthroughs in modern artificial intelligence. Many recent achievements in the space of natural language processing (NLP) can be attributed to the finetuning of pre-trained models on a diverse set of tasks that enables a large language model (LLM) to respond to instructions. Instruction fine-tuning (IFT) requires specifically constructed and annotated datasets. However, existing datasets are almost all in the English language. In this work, our primary goal is to bridge the language gap by building a human-curated instruction-following dataset spanning 65 languages. We worked with fluent speakers of languages from around the world to collect natural instances of instructions and completions. Furthermore, we create the most extensive multilingual collection to date, comprising 513 million instances through templating and translating existing datasets across 114 languages. In total, we contribute four key resources: we develop and open-source the Aya Annotation Platform, the Aya Dataset, the Aya Collection, and the Aya Evaluation Suite. The Aya initiative also serves as a valuable case study in participatory research, involving collaborators from 119 countries. We see this as a valuable framework for future research collaborations that aim to bridge gaps in resources.
- [1220] arXiv:2402.06627 (cross-list from cs.LG) [ pdf , ps , other ]
-
Title: Feedback Loops With Language Models Drive In-Context Reward HackingComments: 44 pages, 12 figuresSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
Abstract: Language models influence the external world: they query APIs that read and write to web pages, generate content that shapes human behavior, and run system commands as autonomous agents. These interactions form feedback loops: LLM outputs affect the world, which in turn affect subsequent LLM outputs. In this work, we show that feedback loops can cause in-context reward hacking (ICRH), where the LLM at test-time optimizes a (potentially implicit) objective but creates negative side effects in the process. For example, consider an LLM agent deployed to increase Twitter engagement; the LLM may retrieve its previous tweets into the context window and make them more controversial, increasing engagement but also toxicity. We identify and study two processes that lead to ICRH: output-refinement and policy-refinement. For these processes, evaluations on static datasets are insufficient -- they miss the feedback effects and thus cannot capture the most harmful behavior. In response, we provide three recommendations for evaluation to capture more instances of ICRH. As AI development accelerates, the effects of feedback loops will proliferate, increasing the need to understand their role in shaping LLM behavior.
- [1221] arXiv:2402.06629 (cross-list from cs.CG) [ pdf , ps , html , other ]
-
Title: Towards the mathematical foundation of the minimum enclosing ball and related problemsComments: arXiv admin note: text overlap with arXiv:2401.03232Subjects: Computational Geometry (cs.CG) ; Artificial Intelligence (cs.AI); Geometric Topology (math.GT)
Abstract: Theoretical background is provided towards the mathematical foundation of the minimum enclosing ball problem. This problem concerns the determination of the unique spherical surface of smallest radius enclosing a given bounded set in the d-dimensional Euclidean space. The study of several problems that are similar or related to the minimum enclosing ball problem has received a considerable impetus from the large amount of applications of these problems in various fields of science and technology. The proposed theoretical framework is based on several enclosing (covering) and partitioning (clustering) theorems and provides among others bounds and relations between the circumradius, inradius, diameter and width of a set. These enclosing and partitioning theorems are considered as cornerstones in the field that strongly influencing developments and generalizations to other spaces and non-Euclidean geometries.
- [1222] arXiv:2402.06638 (cross-list from q-fin.ST) [ pdf , ps , html , other ]
-
Title: Transformers with Attentive Federated Aggregation for Time Series Stock ForecastingComments: Published in IEEE ICOIN 2023Subjects: Statistical Finance (q-fin.ST) ; Artificial Intelligence (cs.AI); Distributed, Parallel, and Cluster Computing (cs.DC); Machine Learning (cs.LG)
Abstract: Recent innovations in transformers have shown their superior performance in natural language processing (NLP) and computer vision (CV). The ability to capture long-range dependencies and interactions in sequential data has also triggered a great interest in time series modeling, leading to the widespread use of transformers in many time series applications. However, being the most common and crucial application, the adaptation of transformers to time series forecasting has remained limited, with both promising and inconsistent results. In contrast to the challenges in NLP and CV, time series problems not only add the complexity of order or temporal dependence among input sequences but also consider trend, level, and seasonality information that much of this data is valuable for decision making. The conventional training scheme has shown deficiencies regarding model overfitting, data scarcity, and privacy issues when working with transformers for a forecasting task. In this work, we propose attentive federated transformers for time series stock forecasting with better performance while preserving the privacy of participating enterprises. Empirical results on various stock data from the Yahoo! Finance website indicate the superiority of our proposed scheme in dealing with the above challenges and data heterogeneity in federated learning.
- [1223] arXiv:2402.06655 (cross-list from cs.CR) [ pdf , ps , html , other ]
-
Title: Adversarial Text Purification: A Large Language Model Approach for DefenseComments: PAKDD 2024Subjects: Cryptography and Security (cs.CR) ; Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG)
Abstract: Adversarial purification is a defense mechanism for safeguarding classifiers against adversarial attacks without knowing the type of attacks or training of the classifier. These techniques characterize and eliminate adversarial perturbations from the attacked inputs, aiming to restore purified samples that retain similarity to the initially attacked ones and are correctly classified by the classifier. Due to the inherent challenges associated with characterizing noise perturbations for discrete inputs, adversarial text purification has been relatively unexplored. In this paper, we investigate the effectiveness of adversarial purification methods in defending text classifiers. We propose a novel adversarial text purification that harnesses the generative capabilities of Large Language Models (LLMs) to purify adversarial text without the need to explicitly characterize the discrete noise perturbations. We utilize prompt engineering to exploit LLMs for recovering the purified examples for given adversarial examples such that they are semantically similar and correctly classified. Our proposed method demonstrates remarkable performance over various classifiers, improving their accuracy under the attack by over 65% on average.
- [1224] arXiv:2402.06656 (cross-list from q-fin.ST) [ pdf , ps , html , other ]
-
Title: DiffsFormer: A Diffusion Transformer on Stock Factor AugmentationSubjects: Statistical Finance (q-fin.ST) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: Machine learning models have demonstrated remarkable efficacy and efficiency in a wide range of stock forecasting tasks. However, the inherent challenges of data scarcity, including low signal-to-noise ratio (SNR) and data homogeneity, pose significant obstacles to accurate forecasting. To address this issue, we propose a novel approach that utilizes artificial intelligence-generated samples (AIGS) to enhance the training procedures. In our work, we introduce the Diffusion Model to generate stock factors with Transformer architecture (DiffsFormer). DiffsFormer is initially trained on a large-scale source domain, incorporating conditional guidance so as to capture global joint distribution. When presented with a specific downstream task, we employ DiffsFormer to augment the training procedure by editing existing samples. This editing step allows us to control the strength of the editing process, determining the extent to which the generated data deviates from the target domain. To evaluate the effectiveness of DiffsFormer augmented training, we conduct experiments on the CSI300 and CSI800 datasets, employing eight commonly used machine learning models. The proposed method achieves relative improvements of 7.2% and 27.8% in annualized return ratio for the respective datasets. Furthermore, we perform extensive experiments to gain insights into the functionality of DiffsFormer and its constituent components, elucidating how they address the challenges of data scarcity and enhance the overall model performance. Our research demonstrates the efficacy of leveraging AIGS and the DiffsFormer architecture to mitigate data scarcity in stock forecasting tasks.
- [1225] arXiv:2402.06659 (cross-list from cs.CR) [ pdf , ps , html , other ]
-
Title: Shadowcast: Stealthy Data Poisoning Attacks Against Vision-Language ModelsYuancheng Xu , Jiarui Yao , Manli Shu , Yanchao Sun , Zichu Wu , Ning Yu , Tom Goldstein , Furong HuangSubjects: Cryptography and Security (cs.CR) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: Vision-Language Models (VLMs) excel in generating textual responses from visual inputs, yet their versatility raises significant security concerns. This study takes the first step in exposing VLMs' susceptibility to data poisoning attacks that can manipulate responses to innocuous, everyday prompts. We introduce Shadowcast, a stealthy data poisoning attack method where poison samples are visually indistinguishable from benign images with matching texts. Shadowcast demonstrates effectiveness in two attack types. The first is Label Attack, tricking VLMs into misidentifying class labels, such as confusing Donald Trump for Joe Biden. The second is Persuasion Attack, which leverages VLMs' text generation capabilities to craft narratives, such as portraying junk food as health food, through persuasive and seemingly rational descriptions. We show that Shadowcast are highly effective in achieving attacker's intentions using as few as 50 poison samples. Moreover, these poison samples remain effective across various prompts and are transferable across different VLM architectures in the black-box setting. This work reveals how poisoned VLMs can generate convincing yet deceptive misinformation and underscores the importance of data quality for responsible deployments of VLMs. Our code is available at: this https URL .
- [1226] arXiv:2402.06663 (cross-list from cs.CR) [ pdf , ps , html , other ]
-
Title: Explainable Adversarial Learning Framework on Physical Layer Secret Keys Combating Malicious Reconfigurable Intelligent SurfaceSubjects: Cryptography and Security (cs.CR) ; Artificial Intelligence (cs.AI)
Abstract: The development of reconfigurable intelligent surfaces (RIS) is a double-edged sword to physical layer security (PLS). Whilst a legitimate RIS can yield beneficial impacts including increased channel randomness to enhance physical layer secret key generation (PL-SKG), malicious RIS can poison legitimate channels and crack most of existing PL-SKGs. In this work, we propose an adversarial learning framework between legitimate parties (namely Alice and Bob) to address this Man-in-the-middle malicious RIS (MITM-RIS) eavesdropping. First, the theoretical mutual information gap between legitimate pairs and MITM-RIS is deduced. Then, Alice and Bob leverage generative adversarial networks (GANs) to learn to achieve a common feature surface that does not have mutual information overlap with MITM-RIS. Next, we aid signal processing interpretation of black-box neural networks by using a symbolic explainable AI (xAI) representation. These symbolic terms of dominant neurons aid feature engineering-based validation and future design of PLS common feature space. Simulation results show that our proposed GAN-based and symbolic-based PL-SKGs can achieve high key agreement rates between legitimate users, and is even resistant to MITM-RIS Eve with the knowledge of legitimate feature generation (NNs or formulas). This therefore paves the way to secure wireless communications with untrusted reflective devices in future 6G.
- [1227] arXiv:2402.06664 (cross-list from cs.CR) [ pdf , ps , html , other ]
-
Title: LLM Agents can Autonomously Hack WebsitesSubjects: Cryptography and Security (cs.CR) ; Artificial Intelligence (cs.AI)
Abstract: In recent years, large language models (LLMs) have become increasingly capable and can now interact with tools (i.e., call functions), read documents, and recursively call themselves. As a result, these LLMs can now function autonomously as agents. With the rise in capabilities of these agents, recent work has speculated on how LLM agents would affect cybersecurity. However, not much is known about the offensive capabilities of LLM agents.
In this work, we show that LLM agents can autonomously hack websites, performing tasks as complex as blind database schema extraction and SQL injections without human feedback. Importantly, the agent does not need to know the vulnerability beforehand. This capability is uniquely enabled by frontier models that are highly capable of tool use and leveraging extended context. Namely, we show that GPT-4 is capable of such hacks, but existing open-source models are not. Finally, we show that GPT-4 is capable of autonomously finding vulnerabilities in websites in the wild. Our findings raise questions about the widespread deployment of LLMs. - [1228] arXiv:2402.06666 (cross-list from physics.ao-ph) [ pdf , ps , html , other ]
-
Title: Weather Prediction with Diffusion Guided by Realistic Forecast ProcessesSubjects: Atmospheric and Oceanic Physics (physics.ao-ph) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: Weather forecasting remains a crucial yet challenging domain, where recently developed models based on deep learning (DL) have approached the performance of traditional numerical weather prediction (NWP) models. However, these DL models, often complex and resource-intensive, face limitations in flexibility post-training and in incorporating NWP predictions, leading to reliability concerns due to potential unphysical predictions. In response, we introduce a novel method that applies diffusion models (DM) for weather forecasting. In particular, our method can achieve both direct and iterative forecasting with the same modeling framework. Our model is not only capable of generating forecasts independently but also uniquely allows for the integration of NWP predictions, even with varying lead times, during its sampling process. The flexibility and controllability of our model empowers a more trustworthy DL system for the general weather community. Additionally, incorporating persistence and climatology data further enhances our model's long-term forecasting stability. Our empirical findings demonstrate the feasibility and generalizability of this approach, suggesting a promising direction for future, more sophisticated diffusion models without the need for retraining.
- [1229] arXiv:2402.06680 (cross-list from physics.soc-ph) [ pdf , ps , html , other ]
-
Title: Social Physics Informed Diffusion Model for Crowd SimulationSubjects: Physics and Society (physics.soc-ph) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: Crowd simulation holds crucial applications in various domains, such as urban planning, architectural design, and traffic arrangement. In recent years, physics-informed machine learning methods have achieved state-of-the-art performance in crowd simulation but fail to model the heterogeneity and multi-modality of human movement comprehensively. In this paper, we propose a social physics-informed diffusion model named SPDiff to mitigate the above gap. SPDiff takes both the interactive and historical information of crowds in the current timeframe to reverse the diffusion process, thereby generating the distribution of pedestrian movement in the subsequent timeframe. Inspired by the well-known social physics model, i.e., Social Force, regarding crowd dynamics, we design a crowd interaction module to guide the denoising process and further enhance this module with the equivariant properties of crowd interactions. To mitigate error accumulation in long-term simulations, we propose a multi-frame rollout training algorithm for diffusion modeling. Experiments conducted on two real-world datasets demonstrate the superior performance of SPDiff in terms of macroscopic and microscopic evaluation metrics. Code and appendix are available at this https URL .
- [1230] arXiv:2402.06682 (cross-list from cs.CR) [ pdf , ps , html , other ]
-
Title: Private Knowledge Sharing in Distributed Learning: A SurveyYasas Supeksala , Dinh C. Nguyen , Ming Ding , Thilina Ranbaduge , Calson Chua , Jun Zhang , Jun Li , H. Vincent PoorComments: Manuscript submitted to ACMSubjects: Cryptography and Security (cs.CR) ; Artificial Intelligence (cs.AI); Distributed, Parallel, and Cluster Computing (cs.DC); Machine Learning (cs.LG)
Abstract: The rise of Artificial Intelligence (AI) has revolutionized numerous industries and transformed the way society operates. Its widespread use has led to the distribution of AI and its underlying data across many intelligent systems. In this light, it is crucial to utilize information in learning processes that are either distributed or owned by different entities. As a result, modern data-driven services have been developed to integrate distributed knowledge entities into their outcomes. In line with this goal, the latest AI models are frequently trained in a decentralized manner. Distributed learning involves multiple entities working together to make collective predictions and decisions. However, this collaboration can also bring about security vulnerabilities and challenges. This paper provides an in-depth survey on private knowledge sharing in distributed learning, examining various knowledge components utilized in leading distributed learning architectures. Our analysis sheds light on the most critical vulnerabilities that may arise when using these components in a distributed setting. We further identify and examine defensive strategies for preserving the privacy of these knowledge components and preventing malicious parties from manipulating or accessing the knowledge information. Finally, we highlight several key limitations of knowledge sharing in distributed learning and explore potential avenues for future research.
- [1231] arXiv:2402.06692 (cross-list from eess.IV) [ pdf , ps , html , other ]
-
Title: HistoHDR-Net: Histogram Equalization for Single LDR to HDR Image TranslationComments: Submitted to IEEESubjects: Image and Video Processing (eess.IV) ; Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR); Machine Learning (cs.LG); Multimedia (cs.MM)
Abstract: High Dynamic Range (HDR) imaging aims to replicate the high visual quality and clarity of real-world scenes. Due to the high costs associated with HDR imaging, the literature offers various data-driven methods for HDR image reconstruction from Low Dynamic Range (LDR) counterparts. A common limitation of these approaches is missing details in regions of the reconstructed HDR images, which are over- or under-exposed in the input LDR images. To this end, we propose a simple and effective method, HistoHDR-Net, to recover the fine details (e.g., color, contrast, saturation, and brightness) of HDR images via a fusion-based approach utilizing histogram-equalized LDR images along with self-attention guidance. Our experiments demonstrate the efficacy of the proposed approach over the state-of-art methods.
- [1232] arXiv:2402.06694 (cross-list from cs.LG) [ pdf , ps , other ]
-
Title: Scaling Intelligent Agents in Combat Simulations for WargamingComments: arXiv admin note: text overlap with arXiv:2402.06075Journal-ref: I/ITSEC Conference Proceedings 2023Subjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Abstract: Remaining competitive in future conflicts with technologically-advanced competitors requires us to accelerate our research and development in artificial intelligence (AI) for wargaming. More importantly, leveraging machine learning for intelligent combat behavior development will be key to one day achieving superhuman performance in this domain--elevating the quality and accelerating the speed of our decisions in future wars. Although deep reinforcement learning (RL) continues to show promising results in intelligent agent behavior development in games, it has yet to perform at or above the human level in the long-horizon, complex tasks typically found in combat modeling and simulation. Capitalizing on the proven potential of RL and recent successes of hierarchical reinforcement learning (HRL), our research is investigating and extending the use of HRL to create intelligent agents capable of performing effectively in these large and complex simulation environments. Our ultimate goal is to develop an agent capable of superhuman performance that could then serve as an AI advisor to military planners and decision-makers. This papers covers our ongoing approach and the first three of our five research areas aimed at managing the exponential growth of computations that have thus far limited the use of AI in combat simulations: (1) developing an HRL training framework and agent architecture for combat units; (2) developing a multi-model framework for agent decision-making; (3) developing dimension-invariant observation abstractions of the state space to manage the exponential growth of computations; (4) developing an intrinsic rewards engine to enable long-term planning; and (5) implementing this framework into a higher-fidelity combat simulation.
- [1233] arXiv:2402.06696 (cross-list from cs.LG) [ pdf , ps , html , other ]
-
Title: FL-NAS: Towards Fairness of NAS for Resource Constrained Devices via Large Language ModelsComments: ASP-DAC 2024Subjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Abstract: Neural Architecture Search (NAS) has become the de fecto tools in the industry in automating the design of deep neural networks for various applications, especially those driven by mobile and edge devices with limited computing resources. The emerging large language models (LLMs), due to their prowess, have also been incorporated into NAS recently and show some promising results. This paper conducts further exploration in this direction by considering three important design metrics simultaneously, i.e., model accuracy, fairness, and hardware deployment efficiency. We propose a novel LLM-based NAS framework, FL-NAS, in this paper, and show experimentally that FL-NAS can indeed find high-performing DNNs, beating state-of-the-art DNN models by orders-of-magnitude across almost all design considerations.
- [1234] arXiv:2402.06697 (cross-list from cs.LG) [ pdf , ps , html , other ]
-
Title: Feed-Forward Neural Networks as a Mixed-Integer ProgramSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Optimization and Control (math.OC)
Abstract: Deep neural networks (DNNs) are widely studied in various applications. A DNN consists of layers of neurons that compute affine combinations, apply nonlinear operations, and produce corresponding activations. The rectified linear unit (ReLU) is a typical nonlinear operator, outputting the max of its input and zero. In scenarios like max pooling, where multiple input values are involved, a fixed-parameter DNN can be modeled as a mixed-integer program (MIP). This formulation, with continuous variables representing unit outputs and binary variables for ReLU activation, finds applications across diverse domains. This study explores the formulation of trained ReLU neurons as MIP and applies MIP models for training neural networks (NNs). Specifically, it investigates interactions between MIP techniques and various NN architectures, including binary DNNs (employing step activation functions) and binarized DNNs (with weights and activations limited to $-1,0,+1$). The research focuses on training and evaluating proposed approaches through experiments on handwritten digit classification models. The comparative study assesses the performance of trained ReLU NNs, shedding light on the effectiveness of MIP formulations in enhancing training processes for NNs.
- [1235] arXiv:2402.06700 (cross-list from cs.LG) [ pdf , ps , html , other ]
-
Title: Entropy-Regularized Token-Level Policy Optimization for Large Language ModelsSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Abstract: Large Language Models (LLMs) have shown promise as intelligent agents in interactive decision-making tasks. Traditional approaches often depend on meticulously designed prompts, high-quality examples, or additional reward models for in-context learning, supervised fine-tuning, or RLHF. Reinforcement learning (RL) presents a dynamic alternative for LLMs to overcome these dependencies by engaging directly with task-specific environments. Nonetheless, it faces significant hurdles: 1) instability stemming from the exponentially vast action space requiring exploration; 2) challenges in assigning token-level credit based on action-level reward signals, resulting in discord between maximizing rewards and accurately modeling corpus data. In response to these challenges, we introduce Entropy-Regularized Token-level Policy Optimization (ETPO), an entropy-augmented RL method tailored for optimizing LLMs at the token level. At the heart of ETPO is our novel per-token soft Bellman update, designed to harmonize the RL process with the principles of language modeling. This methodology decomposes the Q-function update from a coarse action-level view to a more granular token-level perspective, backed by theoretical proof of optimization consistency. Crucially, this decomposition renders linear time complexity in action exploration. We assess the effectiveness of ETPO within a simulated environment that models data science code generation as a series of multi-step interactive tasks; results show that ETPO achieves effective performance improvement on the CodeLlama-7B model and surpasses a variant PPO baseline inherited from RLHF. This underlines ETPO's potential as a robust method for refining the interactive decision-making capabilities of LLMs. Our code is open-sourced at this https URL .
- [1236] arXiv:2402.06716 (cross-list from cs.LG) [ pdf , ps , html , other ]
-
Title: Dynamic Graph Information BottleneckComments: Accepted by the research tracks of The Web Conference 2024 (WWW 2024)Subjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Abstract: Dynamic Graphs widely exist in the real world, which carry complicated spatial and temporal feature patterns, challenging their representation learning. Dynamic Graph Neural Networks (DGNNs) have shown impressive predictive abilities by exploiting the intrinsic dynamics. However, DGNNs exhibit limited robustness, prone to adversarial attacks. This paper presents the novel Dynamic Graph Information Bottleneck (DGIB) framework to learn robust and discriminative representations. Leveraged by the Information Bottleneck (IB) principle, we first propose the expected optimal representations should satisfy the Minimal-Sufficient-Consensual (MSC) Condition. To compress redundant as well as conserve meritorious information into latent representation, DGIB iteratively directs and refines the structural and feature information flow passing through graph snapshots. To meet the MSC Condition, we decompose the overall IB objectives into DGIB$_{MS}$ and DGIB$_C$, in which the DGIB$_{MS}$ channel aims to learn the minimal and sufficient representations, with the DGIB$_{MS}$ channel guarantees the predictive consensus. Extensive experiments on real-world and synthetic dynamic graph datasets demonstrate the superior robustness of DGIB against adversarial attacks compared with state-of-the-art baselines in the link prediction task. To the best of our knowledge, DGIB is the first work to learn robust representations of dynamic graphs grounded in the information-theoretic IB principle.
- [1237] arXiv:2402.06733 (cross-list from cs.CL) [ pdf , ps , html , other ]
-
Title: NICE: To Optimize In-Context Examples or Not?Subjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: Recent work shows that in-context learning and optimization of in-context examples (ICE) can significantly improve the accuracy of large language models (LLMs) on a wide range of tasks, leading to an apparent consensus that ICE optimization is crucial for better performance. However, most of these studies assume a fixed or no instruction provided in the prompt. We challenge this consensus by investigating the necessity of optimizing ICE when task-specific instructions are provided and find that there are tasks for which it yields diminishing returns. In particular, using a diverse set of tasks and a systematically created instruction set with gradually added details, we find that as the prompt instruction becomes more detailed, the returns on ICE optimization diminish. To characterize this behavior, we introduce a task-specific metric called Normalized Invariability to Choice of Examples (NICE) that quantifies the learnability of tasks from a given instruction, and provides a heuristic that helps decide whether to optimize instructions or ICE for a new task. Given a task, the proposed metric can reliably predict the utility of optimizing ICE compared to using random ICE.
- [1238] arXiv:2402.06734 (cross-list from cs.LG) [ pdf , ps , other ]
-
Title: Corruption Robust Offline Reinforcement Learning with Human FeedbackSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Abstract: We study data corruption robustness for reinforcement learning with human feedback (RLHF) in an offline setting. Given an offline dataset of pairs of trajectories along with feedback about human preferences, an $\varepsilon$-fraction of the pairs is corrupted (e.g., feedback flipped or trajectory features manipulated), capturing an adversarial attack or noisy human preferences. We aim to design algorithms that identify a near-optimal policy from the corrupted data, with provable guarantees. Existing theoretical works have separately studied the settings of corruption robust RL (learning from scalar rewards directly under corruption) and offline RLHF (learning from human feedback without corruption); however, they are inapplicable to our problem of dealing with corrupted data in offline RLHF setting. To this end, we design novel corruption robust offline RLHF methods under various assumptions on the coverage of the data-generating distributions. At a high level, our methodology robustifies an offline RLHF framework by first learning a reward model along with confidence sets and then learning a pessimistic optimal policy over the confidence set. Our key insight is that learning optimal policy can be done by leveraging an offline corruption-robust RL oracle in different ways (e.g., zero-order oracle or first-order oracle), depending on the data coverage assumptions. To our knowledge, ours is the first work that provides provable corruption robust offline RLHF methods.
- [1239] arXiv:2402.06737 (cross-list from cs.LG) [ pdf , ps , html , other ]
-
Title: ExGRG: Explicitly-Generated Relation Graph for Self-Supervised Representation LearningSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Abstract: Self-supervised Learning (SSL) has emerged as a powerful technique in pre-training deep learning models without relying on expensive annotated labels, instead leveraging embedded signals in unlabeled data. While SSL has shown remarkable success in computer vision tasks through intuitive data augmentation, its application to graph-structured data poses challenges due to the semantic-altering and counter-intuitive nature of graph augmentations. Addressing this limitation, this paper introduces a novel non-contrastive SSL approach to Explicitly Generate a compositional Relation Graph (ExGRG) instead of relying solely on the conventional augmentation-based implicit relation graph. ExGRG offers a framework for incorporating prior domain knowledge and online extracted information into the SSL invariance objective, drawing inspiration from the Laplacian Eigenmap and Expectation-Maximization (EM). Employing an EM perspective on SSL, our E-step involves relation graph generation to identify candidates to guide the SSL invariance objective, and M-step updates the model parameters by integrating the derived relational information. Extensive experimentation on diverse node classification datasets demonstrates the superiority of our method over state-of-the-art techniques, affirming ExGRG as an effective adoption of SSL for graph representation learning.
- [1240] arXiv:2402.06759 (cross-list from cs.HC) [ pdf , ps , other ]
-
Title: A Methodology for Questionnaire Analysis: Insights through Cluster Analysis of an Investor Competition DataComments: 14 pages, 12 figuresSubjects: Human-Computer Interaction (cs.HC) ; Artificial Intelligence (cs.AI)
Abstract: In this paper, we propose a methodology for the analysis of questionnaire data along with its application on discovering insights from investor data motivated by a day trading competition. The questionnaire includes categorical questions, which are reduced to binary questions, 'yes' or 'no'. The methodology reduces dimensionality by grouping questions and participants with similar responses using clustering analysis. Rule discovery was performed by using a conversion rate metric. Innovative visual representations were proposed to validate the cluster analysis and the relation discovery between questions. When crossing with financial data, additional insights were revealed related to the recognized clusters.
- [1241] arXiv:2402.06766 (cross-list from cs.CL) [ pdf , ps , html , other ]
-
Title: Evaluation Metrics for Text Data Augmentation in NLPSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI)
Abstract: Recent surveys on data augmentation for natural language processing have reported different techniques and advancements in the field. Several frameworks, tools, and repositories promote the implementation of text data augmentation pipelines. However, a lack of evaluation criteria and standards for method comparison due to different tasks, metrics, datasets, architectures, and experimental settings makes comparisons meaningless. Also, a lack of methods unification exists and text data augmentation research would benefit from unified metrics to compare different augmentation methods. Thus, academics and the industry endeavor relevant evaluation metrics for text data augmentation techniques. The contribution of this work is to provide a taxonomy of evaluation metrics for text augmentation methods and serve as a direction for a unified benchmark. The proposed taxonomy organizes categories that include tools for implementation and metrics calculation. Finally, with this study, we intend to present opportunities to explore the unification and standardization of text data augmentation metrics.
- [1242] arXiv:2402.06772 (cross-list from q-bio.QM) [ pdf , ps , html , other ]
-
Title: Retrosynthesis Prediction via Search in (Hyper) GraphSubjects: Quantitative Methods (q-bio.QM) ; Artificial Intelligence (cs.AI); Computational Engineering, Finance, and Science (cs.CE); Machine Learning (cs.LG)
Abstract: Predicting reactants from a specified core product stands as a fundamental challenge within organic synthesis, termed retrosynthesis prediction. Recently, semi-template-based methods and graph-edits-based methods have achieved good performance in terms of both interpretability and accuracy. However, due to their mechanisms these methods cannot predict complex reactions, e.g., reactions with multiple reaction center or attaching the same leaving group to more than one atom. In this study we propose a semi-template-based method, the \textbf{Retro}synthesis via \textbf{S}earch \textbf{i}n (Hyper) \textbf{G}raph (RetroSiG) framework to alleviate these limitations. In the proposed method, we turn the reaction center identification and the leaving group completion tasks as tasks of searching in the product molecular graph and leaving group hypergraph respectively. As a semi-template-based method RetroSiG has several advantages. First, RetroSiG is able to handle the complex reactions mentioned above by its novel search mechanism. Second, RetroSiG naturally exploits the hypergraph to model the implicit dependencies between leaving groups. Third, RetroSiG makes full use of the prior, i.e., one-hop constraint. It reduces the search space and enhances overall performance. Comprehensive experiments demonstrated that RetroSiG achieved competitive results. Furthermore, we conducted experiments to show the capability of RetroSiG in predicting complex reactions. Ablation experiments verified the efficacy of specific elements, such as the one-hop constraint and the leaving group hypergraph.
- [1243] arXiv:2402.06784 (cross-list from cs.CV) [ pdf , ps , html , other ]
-
Title: Transfer learning with generative models for object detection on limited datasetsComments: 24 pages, 15 figuresSubjects: Computer Vision and Pattern Recognition (cs.CV) ; Disordered Systems and Neural Networks (cond-mat.dis-nn); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Numerical Analysis (math.NA)
Abstract: The availability of data is limited in some fields, especially for object detection tasks, where it is necessary to have correctly labeled bounding boxes around each object. A notable example of such data scarcity is found in the domain of marine biology, where it is useful to develop methods to automatically detect submarine species for environmental monitoring. To address this data limitation, the state-of-the-art machine learning strategies employ two main approaches. The first involves pretraining models on existing datasets before generalizing to the specific domain of interest. The second strategy is to create synthetic datasets specifically tailored to the target domain using methods like copy-paste techniques or ad-hoc simulators. The first strategy often faces a significant domain shift, while the second demands custom solutions crafted for the specific task. In response to these challenges, here we propose a transfer learning framework that is valid for a generic scenario. In this framework, generated images help to improve the performances of an object detector in a few-real data regime. This is achieved through a diffusion-based generative model that was pretrained on large generic datasets, and is not trained on the task-specific domain. We validate our approach on object detection tasks, specifically focusing on fishes in an underwater environment, and on the more common domain of cars in an urban setting. Our method achieves detection performance comparable to models trained on thousands of images, using only a few hundreds of input data. Our results pave the way for new generative AI-based protocols for machine learning applications in various domains, for instance ranging from geophysics to biology and medicine.
- [1244] arXiv:2402.06794 (cross-list from cs.CV) [ pdf , ps , html , other ]
-
Title: Is it safe to cross? Interpretable Risk Assessment with GPT-4V for Safety-Aware Street CrossingSubjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI)
Abstract: Safely navigating street intersections is a complex challenge for blind and low-vision individuals, as it requires a nuanced understanding of the surrounding context - a task heavily reliant on visual cues. Traditional methods for assisting in this decision-making process often fall short, lacking the ability to provide a comprehensive scene analysis and safety level. This paper introduces an innovative approach that leverages large multimodal models (LMMs) to interpret complex street crossing scenes, offering a potential advancement over conventional traffic signal recognition techniques. By generating a safety score and scene description in natural language, our method supports safe decision-making for the blind and low-vision individuals. We collected crosswalk intersection data that contains multiview egocentric images captured by a quadruped robot and annotated the images with corresponding safety scores based on our predefined safety score categorization. Grounded on the visual knowledge, extracted from images, and text prompt, we evaluate a large multimodal model for safety score prediction and scene description. Our findings highlight the reasoning and safety score prediction capabilities of a LMM, activated by various prompts, as a pathway to developing a trustworthy system, crucial for applications requiring reliable decision-making support.
- [1245] arXiv:2402.06810 (cross-list from cs.SD) [ pdf , ps , html , other ]
-
Title: Evaluating Co-Creativity using Total Information FlowSubjects: Sound (cs.SD) ; Artificial Intelligence (cs.AI); Human-Computer Interaction (cs.HC); Information Theory (cs.IT); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
Abstract: Co-creativity in music refers to two or more musicians or musical agents interacting with one another by composing or improvising music. However, this is a very subjective process and each musician has their own preference as to which improvisation is better for some context. In this paper, we aim to create a measure based on total information flow to quantitatively evaluate the co-creativity process in music. In other words, our measure is an indication of how "good" a creative musical process is. Our main hypothesis is that a good musical creation would maximize information flow between the participants captured by music voices recorded in separate tracks. We propose a method to compute the information flow using pre-trained generative models as entropy estimators. We demonstrate how our method matches with human perception using a qualitative study.
- [1246] arXiv:2402.06859 (cross-list from cs.LG) [ pdf , ps , html , other ]
-
Title: LiRank: Industrial Large Scale Ranking Models at LinkedInFedor Borisyuk , Mingzhou Zhou , Qingquan Song , Siyu Zhu , Birjodh Tiwana , Ganesh Parameswaran , Siddharth Dangi , Lars Hertel , Qiang Xiao , Xiaochen Hou , Yunbo Ouyang , Aman Gupta , Sheallika Singh , Dan Liu , Hailing Cheng , Lei Le , Jonathan Hung , Sathiya Keerthi , Ruoyan Wang , Fengyu Zhang , Mohit Kothari , Chen Zhu , Daqi Sun , Yun Dai , Xun Luan , Sirou Zhu , Zhiwei Wang , Neil Daftary , Qianqi Shen , Chengming Jiang , Haichao Wei , Maneesh Varshney , Amol Ghoting , Souvik GhoshSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Information Retrieval (cs.IR)
Abstract: We present LiRank, a large-scale ranking framework at LinkedIn that brings to production state-of-the-art modeling architectures and optimization methods. We unveil several modeling improvements, including Residual DCN, which adds attention and residual connections to the famous DCNv2 architecture. We share insights into combining and tuning SOTA architectures to create a unified model, including Dense Gating, Transformers and Residual DCN. We also propose novel techniques for calibration and describe how we productionalized deep learning based explore/exploit methods. To enable effective, production-grade serving of large ranking models, we detail how to train and compress models using quantization and vocabulary compression. We provide details about the deployment setup for large-scale use cases of Feed ranking, Jobs Recommendations, and Ads click-through rate (CTR) prediction. We summarize our learnings from various A/B tests by elucidating the most effective technical approaches. These ideas have contributed to relative metrics improvements across the board at LinkedIn: +0.5% member sessions in the Feed, +1.76% qualified job applications for Jobs search and recommendations, and +4.3% for Ads CTR. We hope this work can provide practical insights and solutions for practitioners interested in leveraging large-scale deep ranking systems.
- [1247] arXiv:2402.06864 (cross-list from cs.LG) [ pdf , ps , html , other ]
-
Title: Discriminative Adversarial UnlearningComments: 13 pages including references, 2 tables, 2 figures and 1 algorithmSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Abstract: We introduce a novel machine unlearning framework founded upon the established principles of the min-max optimization paradigm. We capitalize on the capabilities of strong Membership Inference Attacks (MIA) to facilitate the unlearning of specific samples from a trained model. We consider the scenario of two networks, the attacker $\mathbf{A}$ and the trained defender $\mathbf{D}$ pitted against each other in an adversarial objective, wherein the attacker aims at teasing out the information of the data to be unlearned in order to infer membership, and the defender unlearns to defend the network against the attack, whilst preserving its general performance. The algorithm can be trained end-to-end using backpropagation, following the well known iterative min-max approach in updating the attacker and the defender. We additionally incorporate a self-supervised objective effectively addressing the feature space discrepancies between the forget set and the validation set, enhancing unlearning performance. Our proposed algorithm closely approximates the ideal benchmark of retraining from scratch for both random sample forgetting and class-wise forgetting schemes on standard machine-unlearning datasets. Specifically, on the class unlearning scheme, the method demonstrates near-optimal performance and comprehensively overcomes known methods over the random sample forgetting scheme across all metrics and multiple network pruning strategies.
- [1248] arXiv:2402.06871 (cross-list from cs.IR) [ pdf , ps , html , other ]
-
Title: Non-autoregressive Generative Models for Reranking RecommendationComments: Work in progressSubjects: Information Retrieval (cs.IR) ; Artificial Intelligence (cs.AI)
Abstract: In a multi-stage recommendation system, reranking plays a crucial role by modeling the intra-list correlations among items.The key challenge of reranking lies in the exploration of optimal sequences within the combinatorial space of permutations. Recent research proposes a generator-evaluator learning paradigm, where the generator generates multiple feasible sequences and the evaluator picks out the best sequence based on the estimated listwise score. Generator is of vital importance, and generative models are well-suited for the generator function. Current generative models employ an autoregressive strategy for sequence generation. However, deploying autoregressive models in real-time industrial systems is challenging. Hence, we propose a Non-AutoRegressive generative model for reranking Recommendation (NAR4Rec) designed to enhance efficiency and effectiveness. To address challenges related to sparse training samples and dynamic candidates impacting model convergence, we introduce a matching model. Considering the diverse nature of user feedback, we propose a sequence-level unlikelihood training objective to distinguish feasible from unfeasible sequences. Additionally, to overcome the lack of dependency modeling in non-autoregressive models regarding target items, we introduce contrastive decoding to capture correlations among these items. Extensive offline experiments on publicly available datasets validate the superior performance of our proposed approach compared to the existing state-of-the-art reranking methods. Furthermore, our method has been fully deployed in a popular video app Kuaishou with over 300 million daily active users, significantly enhancing online recommendation quality, and demonstrating the effectiveness and efficiency of our approach.
- [1249] arXiv:2402.06894 (cross-list from cs.CL) [ pdf , ps , html , other ]
-
Title: GenTranslate: Large Language Models are Generative Multilingual Speech and Machine TranslatorsComments: 17 pages. This work is open sourced at: this https URLSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
Abstract: Recent advances in large language models (LLMs) have stepped forward the development of multilingual speech and machine translation by its reduced representation errors and incorporated external knowledge. However, both translation tasks typically utilize beam search decoding and top-1 hypothesis selection for inference. These techniques struggle to fully exploit the rich information in the diverse N-best hypotheses, making them less optimal for translation tasks that require a single, high-quality output sequence. In this paper, we propose a new generative paradigm for translation tasks, namely "GenTranslate", which builds upon LLMs to generate better results from the diverse translation versions in N-best list. Leveraging the rich linguistic knowledge and strong reasoning abilities of LLMs, our new paradigm can integrate the rich information in N-best candidates to generate a higher-quality translation result. Furthermore, to support LLM finetuning, we build and release a HypoTranslate dataset that contains over 592K hypotheses-translation pairs in 11 languages. Experiments on various speech and machine translation benchmarks (e.g., FLEURS, CoVoST-2, WMT) demonstrate that our GenTranslate significantly outperforms the state-of-the-art model.
- [1250] arXiv:2402.06900 (cross-list from cs.CL) [ pdf , ps , other ]
-
Title: Can LLMs Recognize Toxicity? Structured Toxicity Investigation Framework and Semantic-Based MetricComments: 8 page longSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI)
Abstract: In the pursuit of developing Large Language Models (LLMs) that adhere to societal standards, it is imperative to discern the existence of toxicity in the generated text. The majority of existing toxicity metrics rely on encoder models trained on specific toxicity datasets. However, these encoders are susceptible to out-of-distribution (OOD) problems and depend on the definition of toxicity assumed in a dataset. In this paper, we introduce an automatic robust metric grounded on LLMs to distinguish whether model responses are toxic. We start by analyzing the toxicity factors, followed by examining the intrinsic toxic attributes of LLMs to ascertain their suitability as evaluators. Subsequently, we evaluate our metric, LLMs As ToxiciTy Evaluators (LATTE), on evaluation datasets.The empirical results indicate outstanding performance in measuring toxicity, improving upon state-of-the-art metrics by 12 points in F1 score without training procedure. We also show that upstream toxicity has an influence on downstream metrics.
- [1251] arXiv:2402.06908 (cross-list from cs.LG) [ pdf , ps , other ]
-
Title: Topological Neural Networks: Mitigating the Bottlenecks of Graph Neural Networks via Higher-Order InteractionsComments: PhD thesis, 135 pages, 51 figures, 11 tablesSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Abstract: The irreducible complexity of natural phenomena has led Graph Neural Networks to be employed as a standard model to perform representation learning tasks on graph-structured data. While their capacity to capture local and global patterns is remarkable, the implications associated with long-range and higher-order dependencies pose considerable challenges to such models. This work starts with a theoretical framework to reveal the impact of network's width, depth, and graph topology on the over-squashing phenomena in message-passing neural networks. Then, the work drifts towards, higher-order interactions and multi-relational inductive biases via Topological Neural Networks. Such models propagate messages through higher-dimensional structures, providing shortcuts or additional routes for information flow. With this construction, the underlying computational graph is no longer coupled with the input graph structure, thus mitigating the aforementioned bottlenecks while accounting also for higher-order interactions. Inspired by Graph Attention Networks, two topological attention networks are proposed: Simplicial and Cell Attention Networks. The rationale behind these architecture is to leverage the extended notion of neighbourhoods provided by the arrangement of groups of nodes within a simplicial or cell complex to design anisotropic aggregations able to measure the importance of the information coming from different regions of the domain. By doing so, they capture dependencies that conventional Graph Neural Networks might miss. Finally, a multi-way communication scheme is introduced with Enhanced Cellular Isomorphism Networks, which augment topological message passing schemes to enable a direct interactions among groups of nodes arranged in ring-like structures.
- [1252] arXiv:2402.06912 (cross-list from cs.LG) [ pdf , ps , html , other ]
-
Title: Solving Deep Reinforcement Learning Benchmarks with Linear Policy NetworksSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Abstract: Although Deep Reinforcement Learning (DRL) methods can learn effective policies for challenging problems such as Atari games and robotics tasks, algorithms are complex and training times are often long. This study investigates how evolution strategies (ES) perform compared to gradient-based deep reinforcement learning methods. We use ES to optimize the weights of a neural network via neuroevolution, performing direct policy search. We benchmark both regular networks and policy networks consisting of a single linear layer from observations to actions; for three classical ES methods and for three gradient-based methods such as PPO. Our results reveal that ES can find effective linear policies for many RL benchmark tasks, in contrast to DRL methods that can only find successful policies using much larger networks, suggesting that current benchmarks are easier to solve than previously assumed. Interestingly, also for higher complexity tasks, ES achieves results comparable to gradient-based DRL algorithms. Furthermore, we find that by directly accessing the memory state of the game, ES are able to find successful policies in Atari, outperforming DQN. While gradient-based methods have dominated the field in recent years, ES offers an alternative that is easy to implement, parallelize, understand, and tune.
- [1253] arXiv:2402.06918 (cross-list from cs.LG) [ pdf , ps , html , other ]
-
Title: Generating Chain-of-Thoughts with a Direct Pairwise-Comparison Approach to Searching for the Most Promising Intermediate ThoughtSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
Abstract: To improve the ability of the large language model (LLMs) to handle complex reasoning problems, chain-of-thoughts (CoT) methods were proposed to guide LLMs to reason step-by-step, facilitating problem solving from simple to complex tasks. State-of-the-art approaches for generating such a chain involve interactive collaboration, where the learner generates candidate intermediate thoughts, evaluated by the LLM, guiding the generation of subsequent thoughts. However, a widespread yet understudied problem is that the evaluation from the LLM is typically noisy and unreliable, potentially misleading the generation process in selecting promising intermediate thoughts. In this paper, motivated by Vapnik's principle, we propose a novel comparison-based CoT generation algorithm that directly identifies the most promising thoughts with the noisy feedback from the LLM. In each round, we randomly pair intermediate thoughts and directly prompt the LLM to select the more promising one from each pair, allowing us to identify the most promising thoughts through an iterative process. To further model the noise in the comparison, we resort to the techniques of ensemble and dueling bandits and propose two variants of the proposed algorithm. Experiments on three real-world mathematical and reasoning tasks demonstrate the effectiveness of our proposed algorithm and verify the rationale of the direct pairwise comparison.
- [1254] arXiv:2402.06931 (cross-list from cs.NI) [ pdf , ps , html , other ]
-
Title: ORIENT: A Priority-Aware Energy-Efficient Approach for Latency-Sensitive Applications in 6GComments: Conference, 6 pages, 2 figures, 28 equations, 1 table, 1 algorithm, and 16 referencesSubjects: Networking and Internet Architecture (cs.NI) ; Artificial Intelligence (cs.AI); Distributed, Parallel, and Cluster Computing (cs.DC); Machine Learning (cs.LG)
Abstract: Anticipation for 6G's arrival comes with growing concerns about increased energy consumption in computing and networking. The expected surge in connected devices and resource-demanding applications presents unprecedented challenges for energy resources. While sustainable resource allocation strategies have been discussed in the past, these efforts have primarily focused on single-domain orchestration or ignored the unique requirements posed by 6G. To address this gap, we investigate the joint problem of service instance placement and assignment, path selection, and request prioritization, dubbed PIRA. The objective function is to maximize the system's overall profit as a function of the number of concurrently supported requests while simultaneously minimizing energy consumption over an extended period of time. In addition, end-to-end latency requirements and resource capacity constraints are considered for computing and networking resources, where queuing theory is utilized to estimate the Age of Information (AoI) for requests. After formulating the problem in a non-linear fashion, we prove its NP-hardness and propose a method, denoted ORIENT. This method is based on the Double Dueling Deep Q-Learning (D3QL) mechanism and leverages Graph Neural Networks (GNNs) for state encoding. Extensive numerical simulations demonstrate that ORIENT yields near-optimal solutions for varying system sizes and request counts.
- [1255] arXiv:2402.06938 (cross-list from cs.DC) [ pdf , ps , other ]
-
Title: Efficient Resource Scheduling for Distributed Infrastructures Using Negotiation CapabilitiesComments: Accepted in IEEE CLOUD 2023. 13 pages, 5 figuresSubjects: Distributed, Parallel, and Cluster Computing (cs.DC) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: In the past few decades, the rapid development of information and internet technologies has spawned massive amounts of data and information. The information explosion drives many enterprises or individuals to seek to rent cloud computing infrastructure to put their applications in the cloud. However, the agreements reached between cloud computing providers and clients are often not efficient. Many factors affect the efficiency, such as the idleness of the providers' cloud computing infrastructure, and the additional cost to the clients. One possible solution is to introduce a comprehensive, bargaining game (a type of negotiation), and schedule resources according to the negotiation results. We propose an agent-based auto-negotiation system for resource scheduling based on fuzzy logic. The proposed method can complete a one-to-one auto-negotiation process and generate optimal offers for the provider and client. We compare the impact of different member functions, fuzzy rule sets, and negotiation scenario cases on the offers to optimize the system. It can be concluded that our proposed method can utilize resources more efficiently and is interpretable, highly flexible, and customizable. We successfully train machine learning models to replace the fuzzy negotiation system to improve processing speed. The article also highlights possible future improvements to the proposed system and machine learning models. All the codes and data are available in the open-source repository.
- [1256] arXiv:2402.06945 (cross-list from cs.MM) [ pdf , ps , html , other ]
-
Title: Evaluation Metrics for Automated Typographic Poster GenerationComments: Paper accepted be presented in the 13th International Conference Artificial Intelligence in Music, Sound, Art and Design -- EvoMUSART 2024, Held as Part of EvoStar 2024, Aberystwyth, Wales, United Kingdom, April 3\textendash{}5, 2024Subjects: Multimedia (cs.MM) ; Artificial Intelligence (cs.AI); Human-Computer Interaction (cs.HC)
Abstract: Computational Design approaches facilitate the generation of typographic design, but evaluating these designs remains a challenging task. In this paper, we propose a set of heuristic metrics for typographic design evaluation, focusing on their legibility, which assesses the text visibility, aesthetics, which evaluates the visual quality of the design, and semantic features, which estimate how effectively the design conveys the content semantics. We experiment with a constrained evolutionary approach for generating typographic posters, incorporating the proposed evaluation metrics with varied setups, and treating the legibility metrics as constraints. We also integrate emotion recognition to identify text semantics automatically and analyse the performance of the approach and the visual characteristics outputs.
- [1257] arXiv:2402.06952 (cross-list from quant-ph) [ pdf , ps , html , other ]
-
Title: Estimating the Effect of Crosstalk Error on Circuit Fidelity Using Noisy Intermediate-Scale Quantum DevicesSubjects: Quantum Physics (quant-ph) ; Artificial Intelligence (cs.AI)
Abstract: Current advancements in technology have focused the attention of the quantum computing community toward exploring the potential of near-term devices whose computing power surpasses that of classical computers in practical applications. An unresolved central question revolves around whether the inherent noise in these devices can be overcome or whether any potential quantum advantage would be limited. There is no doubt that crosstalk is one of the main sources of noise in noisy intermediate-scale quantum (NISQ) systems, and it poses a fundamental challenge to hardware designs. Crosstalk between parallel instructions can corrupt quantum states and cause incorrect program execution. In this study, we present a comprehensive analysis of the crosstalk error effect on NISQ computers. Our approach is extremely straightforward and practical for characterizing the crosstalk error of various multi-qubit devices. In particular, we combine the randomized benchmarking (RB) and simultaneous randomized benchmarking (SRB) protocol to characterize the crosstalk error from the correlation controlled-NOT (CNOT) gate. We demonstrate this protocol experimentally on 5- \& 7-qubit devices. Our results demonstrate the crosstalk error model of two different IBM quantum devices over the experimental week and compare the error variation against the machine, number of qubits, quantum volume, processor, and topology of the IBM quantum devices. We then confirm the improvement in the circuit fidelity on different benchmarks by up to 3.06x via inserting an instruction barrier, as compared with an IBM quantum noisy device which offers near-optimal crosstalk mitigation in practice. Most importantly, we provide insight to ensure that the quantum operation can perform its quantum magic undisturbed.
- [1258] arXiv:2402.06955 (cross-list from cs.LG) [ pdf , ps , html , other ]
-
Title: Training dynamics in Physics-Informed Neural Networks with feature mappingSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Computational Engineering, Finance, and Science (cs.CE)
Abstract: Physics-Informed Neural Networks (PINNs) have emerged as an iconic machine learning approach for solving Partial Differential Equations (PDEs). Although its variants have achieved significant progress, the empirical success of utilising feature mapping from the wider Implicit Neural Representations studies has been substantially neglected. We investigate the training dynamics of PINNs with a feature mapping layer via the limiting Conjugate Kernel and Neural Tangent Kernel, which sheds light on the convergence and generalisation of the model. We also show the inadequacy of commonly used Fourier-based feature mapping in some scenarios and propose the conditional positive definite Radial Basis Function as a better alternative. The empirical results reveal the efficacy of our method in diverse forward and inverse problem sets. This simple technique can be easily implemented in coordinate input networks and benefits the broad PINNs research.
- [1259] arXiv:2402.06957 (cross-list from cs.CR) [ pdf , ps , html , other ]
-
Title: Architectural Neural Backdoors from First PrinciplesSubjects: Cryptography and Security (cs.CR) ; Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Abstract: While previous research backdoored neural networks by changing their parameters, recent work uncovered a more insidious threat: backdoors embedded within the definition of the network's architecture. This involves injecting common architectural components, such as activation functions and pooling layers, to subtly introduce a backdoor behavior that persists even after (full re-)training. However, the full scope and implications of architectural backdoors have remained largely unexplored. Bober-Irizar et al. [2023] introduced the first architectural backdoor; they showed how to create a backdoor for a checkerboard pattern, but never explained how to target an arbitrary trigger pattern of choice. In this work we construct an arbitrary trigger detector which can be used to backdoor an architecture with no human supervision. This leads us to revisit the concept of architecture backdoors and taxonomise them, describing 12 distinct types. To gauge the difficulty of detecting such backdoors, we conducted a user study, revealing that ML developers can only identify suspicious components in common model definitions as backdoors in 37% of cases, while they surprisingly preferred backdoored models in 33% of cases. To contextualize these results, we find that language models outperform humans at the detection of backdoors. Finally, we discuss defenses against architectural backdoors, emphasizing the need for robust and comprehensive strategies to safeguard the integrity of ML systems.
- [1260] arXiv:2402.06963 (cross-list from cs.LG) [ pdf , ps , html , other ]
-
Title: Tree Ensembles for Contextual BanditsComments: The first two authors contributed equally to this workSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
Abstract: We propose a novel framework for contextual multi-armed bandits based on tree ensembles. Our framework integrates two widely used bandit methods, Upper Confidence Bound and Thompson Sampling, for both standard and combinatorial settings. We demonstrate the effectiveness of our framework via several experimental studies, employing XGBoost, a popular tree ensemble method. Compared to state-of-the-art methods based on neural networks, our methods exhibit superior performance in terms of both regret minimization and computational runtime, when applied to benchmark datasets and the real-world application of navigation over road networks.
- [1261] arXiv:2402.06967 (cross-list from cs.CL) [ pdf , ps , html , other ]
-
Title: Instruct Once, Chat Consistently in Multiple Rounds: An Efficient Tuning Framework for DialogueComments: Work in progressSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI)
Abstract: Tuning pretrained language models for dialogue generation has been a prevalent paradigm for building capable dialogue agents. Yet, traditional tuning narrowly views dialogue generation as resembling other language generation tasks, ignoring the role disparities between two speakers and the multi-round interactive process that dialogues ought to be. Such a manner leads to unsatisfactory chat consistency of the built agent. In this work, we emphasize the interactive, communicative nature of dialogue and argue that it is more feasible to model the speaker roles of agent and user separately, enabling the agent to adhere to its role consistently. We propose an efficient Multi-round Interactive Dialogue Tuning (Midi-Tuning) framework. It models the agent and user individually with two adapters built upon large language models, where they utilize utterances round by round in alternating order and are tuned via a round-level memory caching mechanism. Extensive experiments demonstrate that, our framework performs superior to traditional fine-tuning and harbors the tremendous potential for improving dialogue consistency.
- [1262] arXiv:2402.06973 (cross-list from cs.CL) [ pdf , ps , other ]
-
Title: Event-Keyed SummarizationComments: ARR short paper (under review)Subjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: We introduce event-keyed summarization (EKS), a novel task that marries traditional summarization and document-level event extraction, with the goal of generating a contextualized summary for a specific event, given a document and an extracted event structure. We introduce a dataset for this task, MUCSUM, consisting of summaries of all events in the classic MUC-4 dataset, along with a set of baselines that comprises both pretrained LM standards in the summarization literature, as well as larger frontier models. We show that ablations that reduce EKS to traditional summarization or structure-to-text yield inferior summaries of target events and that MUCSUM is a robust benchmark for this task. Lastly, we conduct a human evaluation of both reference and model summaries, and provide some detailed analysis of the results.
- [1263] arXiv:2402.06982 (cross-list from cs.CV) [ pdf , ps , html , other ]
-
Title: Treatment-wise Glioblastoma Survival Inference with Multi-parametric Preoperative MRIComments: SPIE Medical Imaging 2024: Computer-Aided DiagnosisSubjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI); Medical Physics (physics.med-ph)
Abstract: In this work, we aim to predict the survival time (ST) of glioblastoma (GBM) patients undergoing different treatments based on preoperative magnetic resonance (MR) scans. The personalized and precise treatment planning can be achieved by comparing the ST of different treatments. It is well established that both the current status of the patient (as represented by the MR scans) and the choice of treatment are the cause of ST. While previous related MR-based glioblastoma ST studies have focused only on the direct mapping of MR scans to ST, they have not included the underlying causal relationship between treatments and ST. To address this limitation, we propose a treatment-conditioned regression model for glioblastoma ST that incorporates treatment information in addition to MR scans. Our approach allows us to effectively utilize the data from all of the treatments in a unified manner, rather than having to train separate models for each of the treatments. Furthermore, treatment can be effectively injected into each convolutional layer through the adaptive instance normalization we employ. We evaluate our framework on the BraTS20 ST prediction task. Three treatment options are considered: Gross Total Resection (GTR), Subtotal Resection (STR), and no resection. The evaluation results demonstrate the effectiveness of injecting the treatment for estimating GBM survival.
- [1264] arXiv:2402.06985 (cross-list from cs.CV) [ pdf , ps , html , other ]
-
Title: OSSAR: Towards Open-Set Surgical Activity Recognition in Robot-assisted SurgeryLong Bai , Guankun Wang , Jie Wang , Xiaoxiao Yang , Huxin Gao , Xin Liang , An Wang , Mobarakol Islam , Hongliang RenComments: To appear in IEEE ICRA 2024Subjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI); Robotics (cs.RO)
Abstract: In the realm of automated robotic surgery and computer-assisted interventions, understanding robotic surgical activities stands paramount. Existing algorithms dedicated to surgical activity recognition predominantly cater to pre-defined closed-set paradigms, ignoring the challenges of real-world open-set scenarios. Such algorithms often falter in the presence of test samples originating from classes unseen during training phases. To tackle this problem, we introduce an innovative Open-Set Surgical Activity Recognition (OSSAR) framework. Our solution leverages the hyperspherical reciprocal point strategy to enhance the distinction between known and unknown classes in the feature space. Additionally, we address the issue of over-confidence in the closed set by refining model calibration, avoiding misclassification of unknown classes as known ones. To support our assertions, we establish an open-set surgical activity benchmark utilizing the public JIGSAWS dataset. Besides, we also collect a novel dataset on endoscopic submucosal dissection for surgical activity tasks. Extensive comparisons and ablation experiments on these datasets demonstrate the significant outperformance of our method over existing state-of-the-art approaches. Our proposed solution can effectively address the challenges of real-world surgical scenarios. Our code is publicly accessible at this https URL .
- [1265] arXiv:2402.06992 (cross-list from q-bio.NC) [ pdf , ps , html , other ]
-
Title: A Rational Analysis of the Speech-to-Song IllusionComments: 7 pages, 5 figuresSubjects: Neurons and Cognition (q-bio.NC) ; Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Applications (stat.AP)
Abstract: The speech-to-song illusion is a robust psychological phenomenon whereby a spoken sentence sounds increasingly more musical as it is repeated. Despite decades of research, a complete formal account of this transformation is still lacking, and some of its nuanced characteristics, namely, that certain phrases appear to transform while others do not, is not well understood. Here we provide a formal account of this phenomenon, by recasting it as a statistical inference whereby a rational agent attempts to decide whether a sequence of utterances is more likely to have been produced in a song or speech. Using this approach and analyzing song and speech corpora, we further introduce a novel prose-to-lyrics illusion that is purely text-based. In this illusion, simply duplicating written sentences makes them appear more like song lyrics. We provide robust evidence for this new illusion in both human participants and large language models.
- [1266] arXiv:2402.07002 (cross-list from cs.LG) [ pdf , ps , other ]
-
Title: Clients Collaborate: Flexible Differentially Private Federated Learning with Guaranteed Improvement of Utility-Privacy Trade-offComments: 22 pages, 8 figuresSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Cryptography and Security (cs.CR)
Abstract: To defend against privacy leakage of user data, differential privacy is widely used in federated learning, but it is not free. The addition of noise randomly disrupts the semantic integrity of the model and this disturbance accumulates with increased communication rounds. In this paper, we introduce a novel federated learning framework with rigorous privacy guarantees, named FedCEO, designed to strike a trade-off between model utility and user privacy by letting clients ''Collaborate with Each Other''. Specifically, we perform efficient tensor low-rank proximal optimization on stacked local model parameters at the server, demonstrating its capability to flexibly truncate high-frequency components in spectral space. This implies that our FedCEO can effectively recover the disrupted semantic information by smoothing the global semantic space for different privacy settings and continuous training processes. Moreover, we improve the SOTA utility-privacy trade-off bound by an order of $\sqrt{d}$, where $d$ is the input dimension. We illustrate our theoretical results with experiments on representative image datasets. It observes significant performance improvements and strict privacy guarantees under different privacy settings.
- [1267] arXiv:2402.07011 (cross-list from cs.LG) [ pdf , ps , html , other ]
-
Title: FedImpro: Measuring and Improving Client Update in Federated LearningSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Distributed, Parallel, and Cluster Computing (cs.DC)
Abstract: Federated Learning (FL) models often experience client drift caused by heterogeneous data, where the distribution of data differs across clients. To address this issue, advanced research primarily focuses on manipulating the existing gradients to achieve more consistent client models. In this paper, we present an alternative perspective on client drift and aim to mitigate it by generating improved local models. First, we analyze the generalization contribution of local training and conclude that this generalization contribution is bounded by the conditional Wasserstein distance between the data distribution of different clients. Then, we propose FedImpro, to construct similar conditional distributions for local training. Specifically, FedImpro decouples the model into high-level and low-level components, and trains the high-level portion on reconstructed feature distributions. This approach enhances the generalization contribution and reduces the dissimilarity of gradients in FL. Experimental results show that FedImpro can help FL defend against data heterogeneity and enhance the generalization performance of the model.
- [1268] arXiv:2402.07023 (cross-list from cs.CL) [ pdf , ps , other ]
-
Title: Gemini Goes to Med School: Exploring the Capabilities of Multimodal Large Language Models on Medical Challenge Problems & HallucinationsComments: Preprint version, Under ReviewSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Human-Computer Interaction (cs.HC); Machine Learning (cs.LG)
Abstract: Large language models have the potential to be valuable in the healthcare industry, but it's crucial to verify their safety and effectiveness through rigorous evaluation. For this purpose, we comprehensively evaluated both open-source LLMs and Google's new multimodal LLM called Gemini across Medical reasoning, hallucination detection, and Medical Visual Question Answering tasks. While Gemini showed competence, it lagged behind state-of-the-art models like MedPaLM 2 and GPT-4 in diagnostic accuracy. Additionally, Gemini achieved an accuracy of 61.45\% on the medical VQA dataset, significantly lower than GPT-4V's score of 88\%. Our analysis revealed that Gemini is highly susceptible to hallucinations, overconfidence, and knowledge gaps, which indicate risks if deployed uncritically. We also performed a detailed analysis by medical subject and test type, providing actionable feedback for developers and clinicians. To mitigate risks, we applied prompting strategies that improved performance. Additionally, we facilitated future research and development by releasing a Python module for medical LLM evaluation and establishing a dedicated leaderboard on Hugging Face for medical domain LLMs. Python module can be found at this https URL
- [1269] arXiv:2402.07031 (cross-list from cs.SE) [ pdf , ps , html , other ]
-
Title: Instance-Level Safety-Aware Fidelity of Synthetic Data and Its CalibrationSubjects: Software Engineering (cs.SE) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: Modeling and calibrating the fidelity of synthetic data is paramount in shaping the future of safe and reliable self-driving technology by offering a cost-effective and scalable alternative to real-world data collection. We focus on its role in safety-critical applications, introducing four types of instance-level fidelity that go beyond mere visual input characteristics. The aim is to ensure that applying testing on synthetic data can reveal real-world safety issues, and the absence of safety-critical issues when testing under synthetic data can provide a strong safety guarantee in real-world behavior. We suggest an optimization method to refine the synthetic data generator, reducing fidelity gaps identified by deep learning components. Experiments show this tuning enhances the correlation between safety-critical errors in synthetic and real data.
- [1270] arXiv:2402.07033 (cross-list from cs.LG) [ pdf , ps , other ]
-
Title: Fiddler: CPU-GPU Orchestration for Fast Inference of Mixture-of-Experts ModelsSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Distributed, Parallel, and Cluster Computing (cs.DC); Operating Systems (cs.OS)
Abstract: Large Language Models (LLMs) based on Mixture-of-Experts (MoE) architecture are showing promising performance on various tasks. However, running them on resource-constrained settings, where GPU memory resources are not abundant, is challenging due to huge model sizes. Existing systems that offload model weights to CPU memory suffer from the significant overhead of frequently moving data between CPU and GPU. In this paper, we propose Fiddler, a resource-efficient inference engine with CPU-GPU orchestration for MoE models. The key idea of Fiddler is to use the computation ability of the CPU to minimize the data movement between the CPU and GPU. Our evaluation shows that Fiddler can run the uncompressed Mixtral-8x7B model, which exceeds 90GB in parameters, to generate over $3$ tokens per second on a single GPU with 24GB memory, showing an order of magnitude improvement over existing methods. The code of Fiddler is publicly available at \url{ this https URL }
- [1271] arXiv:2402.07035 (cross-list from cs.LG) [ pdf , ps , html , other ]
-
Title: Distilling Symbolic Priors for Concept Learning into Neural NetworksComments: 8 pages, 6 figures, 4 tablesSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Abstract: Humans can learn new concepts from a small number of examples by drawing on their inductive biases. These inductive biases have previously been captured by using Bayesian models defined over symbolic hypothesis spaces. Is it possible to create a neural network that displays the same inductive biases? We show that inductive biases that enable rapid concept learning can be instantiated in artificial neural networks by distilling a prior distribution from a symbolic Bayesian model via meta-learning, an approach for extracting the common structure from a set of tasks. By generating the set of tasks used in meta-learning from the prior distribution of a Bayesian model, we are able to transfer that prior into a neural network. We use this approach to create a neural network with an inductive bias towards concepts expressed as short logical formulas. Analyzing results from previous behavioral experiments in which people learned logical concepts from a few examples, we find that our meta-trained models are highly aligned with human performance.
- [1272] arXiv:2402.07043 (cross-list from cs.LG) [ pdf , ps , html , other ]
-
Title: A Tale of Tails: Model Collapse as a Change of Scaling LawsSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
Abstract: As AI model size grows, neural scaling laws have become a crucial tool to predict the improvements of large models when increasing capacity and the size of original (human or natural) training data. Yet, the widespread use of popular models means that the ecosystem of online data and text will co-evolve to progressively contain increased amounts of synthesized data. In this paper we ask: How will the scaling laws change in the inevitable regime where synthetic data makes its way into the training corpus? Will future models, still improve, or be doomed to degenerate up to total (model) collapse? We develop a theoretical framework of model collapse through the lens of scaling laws. We discover a wide range of decay phenomena, analyzing loss of scaling, shifted scaling with number of generations, the ''un-learning" of skills, and grokking when mixing human and synthesized data. Our theory is validated by large-scale experiments with a transformer on an arithmetic task and text generation using the large language model Llama2.
- [1273] arXiv:2402.07051 (cross-list from cs.LG) [ pdf , ps , html , other ]
-
Title: $L^*LM$: Learning Automata from Examples using Natural Language OraclesSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Formal Languages and Automata Theory (cs.FL)
Abstract: Expert demonstrations have proven an easy way to indirectly specify complex tasks. Recent algorithms even support extracting unambiguous formal specifications, e.g. deterministic finite automata (DFA), from demonstrations. Unfortunately, these techniques are generally not sample efficient. In this work, we introduce $L^*LM$, an algorithm for learning DFAs from both demonstrations and natural language. Due to the expressivity of natural language, we observe a significant improvement in the data efficiency of learning DFAs from expert demonstrations. Technically, $L^*LM$ leverages large language models to answer membership queries about the underlying task. This is then combined with recent techniques for transforming learning from demonstrations into a sequence of labeled example learning problems. In our experiments, we observe the two modalities complement each other, yielding a powerful few-shot learner.
- [1274] arXiv:2402.07069 (cross-list from cs.LG) [ pdf , ps , other ]
-
Title: Using Large Language Models to Automate and Expedite Reinforcement Learning with Reward MachineSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
Abstract: We present LARL-RM (Large language model-generated Automaton for Reinforcement Learning with Reward Machine) algorithm in order to encode high-level knowledge into reinforcement learning using automaton to expedite the reinforcement learning. Our method uses Large Language Models (LLM) to obtain high-level domain-specific knowledge using prompt engineering instead of providing the reinforcement learning algorithm directly with the high-level knowledge which requires an expert to encode the automaton. We use chain-of-thought and few-shot methods for prompt engineering and demonstrate that our method works using these approaches. Additionally, LARL-RM allows for fully closed-loop reinforcement learning without the need for an expert to guide and supervise the learning since LARL-RM can use the LLM directly to generate the required high-level knowledge for the task at hand. We also show the theoretical guarantee of our algorithm to converge to an optimal policy. We demonstrate that LARL-RM speeds up the convergence by 30% by implementing our method in two case studies.
- [1275] arXiv:2402.07076 (cross-list from cs.IR) [ pdf , ps , other ]
-
Title: Enhancing Multi-field B2B Cloud Solution Matching via Contrastive Pre-trainingSubjects: Information Retrieval (cs.IR) ; Artificial Intelligence (cs.AI)
Abstract: Cloud solutions have gained significant popularity in the technology industry as they offer a combination of services and tools to tackle specific problems. However, despite their widespread use, the task of identifying appropriate company customers for a specific target solution to the sales team of a solution provider remains a complex business problem that existing matching systems have yet to adequately address. In this work, we study the B2B solution matching problem and identify two main challenges of this scenario: (1) the modeling of complex multi-field features and (2) the limited, incomplete, and sparse transaction data. To tackle these challenges, we propose a framework CAMA, which is built with a hierarchical multi-field matching structure as its backbone and supplemented by three data augmentation strategies and a contrastive pre-training objective to compensate for the imperfections in the available data. Through extensive experiments on a real-world dataset, we demonstrate that CAMA outperforms several strong baseline matching models significantly. Furthermore, we have deployed our matching framework on a system of Huawei Cloud. Our observations indicate an improvement of about 30% compared to the previous online model in terms of Conversion Rate (CVR), which demonstrates its great business value.
- [1276] arXiv:2402.07087 (cross-list from cs.LG) [ pdf , ps , html , other ]
-
Title: Self-Correcting Self-Consuming Loops for Generative Model TrainingNate Gillman , Michael Freeman , Daksh Aggarwal , Chia-Hong Hsu , Calvin Luo , Yonglong Tian , Chen SunComments: This new version contains updated mathematical results (c.f. Remark 4.4), as well as experiments for an additional generative modeling task. Paper under submission; code is available at this https URLSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)
Abstract: As synthetic data becomes higher quality and proliferates on the internet, machine learning models are increasingly trained on a mix of human- and machine-generated data. Despite the successful stories of using synthetic data for representation learning, using synthetic data for generative model training creates "self-consuming loops" which may lead to training instability or even collapse, unless certain conditions are met. Our paper aims to stabilize self-consuming generative model training. Our theoretical results demonstrate that by introducing an idealized correction function, which maps a data point to be more likely under the true data distribution, self-consuming loops can be made exponentially more stable. We then propose self-correction functions, which rely on expert knowledge (e.g. the laws of physics programmed in a simulator), and aim to approximate the idealized corrector automatically and at scale. We empirically validate the effectiveness of self-correcting self-consuming loops on the challenging human motion synthesis task, and observe that it successfully avoids model collapse, even when the ratio of synthetic data to real data is as high as 100%.
- [1277] arXiv:2402.07098 (cross-list from cs.RO) [ pdf , ps , html , other ]
-
Title: Improving Pallet Detection Using Synthetic DataComments: Australasian Conference on Robotics and Automation (ACRA 2023)Subjects: Robotics (cs.RO) ; Artificial Intelligence (cs.AI)
Abstract: The use of synthetic data in machine learning saves a significant amount of time when implementing an effective object detector. However, there is limited research in this domain. This study aims to improve upon previously applied implementations in the task of instance segmentation of pallets in a warehouse environment. This study proposes using synthetically generated domain-randomised data as well as data generated through Unity to achieve this. This study achieved performance improvements on the stacked and racked pallet categories by 69% and 50% mAP50, respectively when being evaluated on real data. Additionally, it was found that there was a considerable impact on the performance of a model when it was evaluated against images in a darker environment, dropping as low as 3% mAP50 when being evaluated on images with an 80% brightness reduction. This study also created a two-stage detector that used YOLOv8 and SAM, but this proved to have unstable performance. The use of domain-randomised data proved to have negligible performance improvements when compared to the Unity-generated data.
- [1278] arXiv:2402.07102 (cross-list from cs.LG) [ pdf , ps , other ]
-
Title: Future Prediction Can be a Strong Evidence of Good History Representation in Partially Observable EnvironmentsSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Abstract: Learning a good history representation is one of the core challenges of reinforcement learning (RL) in partially observable environments. Recent works have shown the advantages of various auxiliary tasks for facilitating representation learning. However, the effectiveness of such auxiliary tasks has not been fully convincing, especially in partially observable environments that require long-term memorization and inference. In this empirical study, we investigate the effectiveness of future prediction for learning the representations of histories, possibly of extensive length, in partially observable environments. We first introduce an approach that decouples the task of learning history representations from policy optimization via future prediction. Then, our main contributions are two-fold: (a) we demonstrate that the performance of reinforcement learning is strongly correlated with the prediction accuracy of future observations in partially observable environments, and (b) our approach can significantly improve the overall end-to-end approach by preventing high-variance noisy signals from reinforcement learning objectives to influence the representation learning. We illustrate our claims on three types of benchmarks that necessitate the ability to process long histories for high returns.
- [1279] arXiv:2402.07107 (cross-list from cs.LG) [ pdf , ps , html , other ]
-
Title: Echoes of Socratic Doubt: Embracing Uncertainty in Calibrated Evidential Reinforcement LearningSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Abstract: We present a novel statistical approach to incorporating uncertainty awareness in model-free distributional reinforcement learning involving quantile regression-based deep Q networks. The proposed algorithm, $\textit{Calibrated Evidential Quantile Regression in Deep Q Networks (CEQR-DQN)}$, aims to address key challenges associated with separately estimating aleatoric and epistemic uncertainty in stochastic environments. It combines deep evidential learning with quantile calibration based on principles of conformal inference to provide explicit, sample-free computations of $\textit{global}$ uncertainty as opposed to $\textit{local}$ estimates based on simple variance, overcoming limitations of traditional methods in computational and statistical efficiency and handling of out-of-distribution (OOD) observations. Tested on a suite of miniaturized Atari games (i.e., MinAtar), CEQR-DQN is shown to surpass similar existing frameworks in scores and learning speed. Its ability to rigorously evaluate uncertainty improves exploration strategies and can serve as a blueprint for other algorithms requiring uncertainty awareness.
- [1280] arXiv:2402.07118 (cross-list from cs.HC) [ pdf , ps , other ]
-
Title: Next-Generation Teleophthalmology: AI-enabled Quality Assessment Aiding Remote Smartphone-based ConsultationComments: 4 pages, Submitted to IEEE EMBC 2024Subjects: Human-Computer Interaction (cs.HC) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Image and Video Processing (eess.IV); Signal Processing (eess.SP)
Abstract: Blindness and other eye diseases are a global health concern, particularly in low- and middle-income countries like India. In this regard, during the COVID-19 pandemic, teleophthalmology became a lifeline, and the Grabi attachment for smartphone-based eye imaging gained in use. However, quality of user-captured image often remained inadequate, requiring clinician vetting and delays. In this backdrop, we propose an AI-based quality assessment system with instant feedback mimicking clinicians' judgments and tested on patient-captured images. Dividing the complex problem hierarchically, here we tackle a nontrivial part, and demonstrate a proof of the concept.
- [1281] arXiv:2402.07127 (cross-list from cs.RO) [ pdf , ps , other ]
-
Title: Learning by Watching: A Review of Video-based Learning Approaches for Robot ManipulationSubjects: Robotics (cs.RO) ; Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Abstract: Robot learning of manipulation skills is hindered by the scarcity of diverse, unbiased datasets. While curated datasets can help, challenges remain in generalizability and real-world transfer. Meanwhile, large-scale "in-the-wild" video datasets have driven progress in computer vision through self-supervised techniques. Translating this to robotics, recent works have explored learning manipulation skills by passively watching abundant videos sourced online. Showing promising results, such video-based learning paradigms provide scalable supervision while reducing dataset bias. This survey reviews foundations such as video feature representation learning techniques, object affordance understanding, 3D hand/body modeling, and large-scale robot resources, as well as emerging techniques for acquiring robot manipulation skills from uncontrolled video demonstrations. We discuss how learning only from observing large-scale human videos can enhance generalization and sample efficiency for robotic manipulation. The survey summarizes video-based learning approaches, analyses their benefits over standard datasets, survey metrics, and benchmarks, and discusses open challenges and future directions in this nascent domain at the intersection of computer vision, natural language processing, and robot learning.
- [1282] arXiv:2402.07129 (cross-list from cs.LG) [ pdf , ps , other ]
-
Title: An attempt to generate new bridge types from latent space of denoising diffusion Implicit modelComments: 9 pages, 7 figuresSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
Abstract: Use denoising diffusion implicit model for bridge-type innovation. The process of adding noise and denoising to an image can be likened to the process of a corpse rotting and a detective restoring the scene of a victim being killed, to help beginners understand. Through an easy-to-understand algebraic method, derive the function formulas for adding noise and denoising, making it easier for beginners to master the mathematical principles of the model. Using symmetric structured image dataset of three-span beam bridge, arch bridge, cable-stayed bridge and suspension bridge , based on Python programming language, TensorFlow and Keras deep learning platform framework , denoising diffusion implicit model is constructed and trained. From the latent space sampling, new bridge types with asymmetric structures can be generated. Denoising diffusion implicit model can organically combine different structural components on the basis of human original bridge types, and create new bridge types.
- [1283] arXiv:2402.07148 (cross-list from cond-mat.soft) [ pdf , ps , html , other ]
-
Title: X-LoRA: Mixture of Low-Rank Adapter Experts, a Flexible Framework for Large Language Models with Applications in Protein Mechanics and Molecular DesignSubjects: Soft Condensed Matter (cond-mat.soft) ; Disordered Systems and Neural Networks (cond-mat.dis-nn); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG); Quantitative Methods (q-bio.QM)
Abstract: We report a mixture of expert strategy to create fine-tuned large language models using a deep layer-wise token-level approach based on low-rank adaptation (LoRA). Starting with a set of pre-trained LoRA adapters, our gating strategy uses the hidden states to dynamically mix adapted layers, allowing the resulting X-LoRA model to draw upon different capabilities and create never-before-used deep layer-wise combinations to solve tasks. The design is inspired by the biological principles of universality and diversity, where neural network building blocks are reused in different hierarchical manifestations. Hence, the X-LoRA model can be easily implemented for any existing large language model (LLM) without a need for modifications of the underlying structure. We develop a tailored X-LoRA model that offers scientific capabilities including forward/inverse analysis tasks and enhanced reasoning capability, focused on biomaterial analysis, protein mechanics and design. The impact of this work include access to readily expandable and adaptable models with strong domain knowledge and the capability to integrate across areas of knowledge. Featuring experts in biology, mathematics, reasoning, bio-inspired materials, mechanics and materials, chemistry, protein biophysics, mechanics and quantum-mechanics based molecular properties, we conduct a series of physics-focused case studies. We examine knowledge recall, protein mechanics forward/inverse tasks, protein design, adversarial agentic modeling including ontological knowledge graph construction, as well as molecular design. The model is capable not only of making quantitative predictions of nanomechanical properties of proteins or quantum mechanical molecular properties, but also reasons over the results and correctly predicts likely mechanisms that explain distinct molecular behaviors.
- [1284] arXiv:2402.07152 (cross-list from cs.LG) [ pdf , ps , other ]
-
Title: Explainable Global Wildfire Prediction Models using Graph Neural NetworksSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Abstract: Wildfire prediction has become increasingly crucial due to the escalating impacts of climate change. Traditional CNN-based wildfire prediction models struggle with handling missing oceanic data and addressing the long-range dependencies across distant regions in meteorological data. In this paper, we introduce an innovative Graph Neural Network (GNN)-based model for global wildfire prediction. We propose a hybrid model that combines the spatial prowess of Graph Convolutional Networks (GCNs) with the temporal depth of Long Short-Term Memory (LSTM) networks. Our approach uniquely transforms global climate and wildfire data into a graph representation, addressing challenges such as null oceanic data locations and long-range dependencies inherent in traditional models. Benchmarking against established architectures using an unseen ensemble of JULES-INFERNO simulations, our model demonstrates superior predictive accuracy. Furthermore, we emphasise the model's explainability, unveiling potential wildfire correlation clusters through community detection and elucidating feature importance via Integrated Gradient analysis. Our findings not only advance the methodological domain of wildfire prediction but also underscore the importance of model transparency, offering valuable insights for stakeholders in wildfire management.
- [1285] arXiv:2402.07153 (cross-list from math.NA) [ pdf , ps , html , other ]
-
Title: Error Estimation for Physics-informed Neural Networks Approximating Semilinear Wave EquationsSubjects: Numerical Analysis (math.NA) ; Artificial Intelligence (cs.AI)
Abstract: This paper provides rigorous error bounds for physics-informed neural networks approximating the semilinear wave equation. We provide bounds for the generalization and training error in terms of the width of the network's layers and the number of training points for a tanh neural network with two hidden layers. Our main result is a bound of the total error in the $H^1([0,T];L^2(\Omega))$-norm in terms of the training error and the number of training points, which can be made arbitrarily small under some assumptions. We illustrate our theoretical bounds with numerical experiments.
- [1286] arXiv:2402.07157 (cross-list from cs.CL) [ pdf , ps , html , other ]
-
Title: Natural Language Reinforcement LearningXidong Feng , Ziyu Wan , Mengyue Yang , Ziyan Wang , Girish A. Koushik , Yali Du , Ying Wen , Jun WangComments: Work in ProgressSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: Reinforcement Learning (RL) has shown remarkable abilities in learning policies for decision-making tasks. However, RL is often hindered by issues such as low sample efficiency, lack of interpretability, and sparse supervision signals. To tackle these limitations, we take inspiration from the human learning process and introduce Natural Language Reinforcement Learning (NLRL), which innovatively combines RL principles with natural language representation. Specifically, NLRL redefines RL concepts like task objectives, policy, value function, Bellman equation, and policy iteration in natural language space. We present how NLRL can be practically implemented with the latest advancements in large language models (LLMs) like GPT-4. Initial experiments over tabular MDPs demonstrate the effectiveness, efficiency, and also interpretability of the NLRL framework.
- [1287] arXiv:2402.07174 (cross-list from cs.HC) [ pdf , ps , html , other ]
-
Title: EmoWear: Exploring Emotional Teasers for Voice Message Interaction on SmartwatchesComments: To appear at ACM CHI '24Subjects: Human-Computer Interaction (cs.HC) ; Artificial Intelligence (cs.AI)
Abstract: Voice messages, by nature, prevent users from gauging the emotional tone without fully diving into the audio content. This hinders the shared emotional experience at the pre-retrieval stage. Research scarcely explored "Emotional Teasers"-pre-retrieval cues offering a glimpse into an awaiting message's emotional tone without disclosing its content. We introduce EmoWear, a smartwatch voice messaging system enabling users to apply 30 animation teasers on message bubbles to reflect emotions. EmoWear eases senders' choice by prioritizing emotions based on semantic and acoustic processing. EmoWear was evaluated in comparison with a mirroring system using color-coded message bubbles as emotional cues (N=24). Results showed EmoWear significantly enhanced emotional communication experience in both receiving and sending messages. The animated teasers were considered intuitive and valued for diverse expressions. Desirable interaction qualities and practical implications are distilled for future design. We thereby contribute both a novel system and empirical knowledge concerning emotional teasers for voice messaging.
- [1288] arXiv:2402.07180 (cross-list from cs.LG) [ pdf , ps , html , other ]
-
Title: MAGNETO: Edge AI for Human Activity Recognition -- Privacy and PersonalizationComments: Accepted by EDBT 2024 (demo track)Subjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Cryptography and Security (cs.CR)
Abstract: Human activity recognition (HAR) is a well-established field, significantly advanced by modern machine learning (ML) techniques. While companies have successfully integrated HAR into consumer products, they typically rely on a predefined activity set, which limits personalizations at the user level (edge devices). Despite advancements in Incremental Learning for updating models with new data, this often occurs on the Cloud, necessitating regular data transfers between cloud and edge devices, thus leading to data privacy issues. In this paper, we propose MAGNETO, an Edge AI platform that pushes HAR tasks from the Cloud to the Edge. MAGNETO allows incremental human activity learning directly on the Edge devices, without any data exchange with the Cloud. This enables strong privacy guarantees, low processing latency, and a high degree of personalization for users. In particular, we demonstrate MAGNETO in an Android device, validating the whole pipeline from data collection to result visualization.
- [1289] arXiv:2402.07191 (cross-list from cs.LG) [ pdf , ps , other ]
-
Title: GSINA: Improving Subgraph Extraction for Graph Invariant Learning via Graph Sinkhorn AttentionSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Abstract: Graph invariant learning (GIL) has been an effective approach to discovering the invariant relationships between graph data and its labels for different graph learning tasks under various distribution shifts. Many recent endeavors of GIL focus on extracting the invariant subgraph from the input graph for prediction as a regularization strategy to improve the generalization performance of graph learning. Despite their success, such methods also have various limitations in obtaining their invariant subgraphs. In this paper, we provide in-depth analyses of the drawbacks of existing works and propose corresponding principles of our invariant subgraph extraction: 1) the sparsity, to filter out the variant features, 2) the softness, for a broader solution space, and 3) the differentiability, for a soundly end-to-end optimization. To meet these principles in one shot, we leverage the Optimal Transport (OT) theory and propose a novel graph attention mechanism called Graph Sinkhorn Attention (GSINA). This novel approach serves as a powerful regularization method for GIL tasks. By GSINA, we are able to obtain meaningful, differentiable invariant subgraphs with controllable sparsity and softness. Moreover, GSINA is a general graph learning framework that could handle GIL tasks of multiple data grain levels. Extensive experiments on both synthetic and real-world datasets validate the superiority of our GSINA, which outperforms the state-of-the-art GIL methods by large margins on both graph-level tasks and node-level tasks. Our code is publicly available at \url{ this https URL }.
- [1290] arXiv:2402.07229 (cross-list from cs.IT) [ pdf , ps , other ]
-
Title: Successive Refinement in Large-Scale Computation: Advancing Model Inference ApplicationsComments: 13 pages, partially appeared in proceedings of IEEE Cloudnet 2022, submitted and under review for IEEE Transactions on Signal ProcessingSubjects: Information Theory (cs.IT) ; Artificial Intelligence (cs.AI)
Abstract: Modern computationally-intensive applications often operate under time constraints, necessitating acceleration methods and distribution of computational workloads across multiple entities. However, the outcome is either achieved within the desired timeline or not, and in the latter case, valuable resources are wasted. In this paper, we introduce solutions for layered-resolution computation. These solutions allow lower-resolution results to be obtained at an earlier stage than the final result. This innovation notably enhances the deadline-based systems, as if a computational job is terminated due to time constraints, an approximate version of the final result can still be generated. Moreover, in certain operational regimes, a high-resolution result might be unnecessary, because the low-resolution result may already deviate significantly from the decision threshold, for example in AI-based decision-making systems. Therefore, operators can decide whether higher resolution is needed or not based on intermediate results, enabling computations with adaptive resolution. We present our framework for two critical and computationally demanding jobs: distributed matrix multiplication (linear) and model inference in machine learning (nonlinear). Our theoretical and empirical results demonstrate that the execution delay for the first resolution is significantly shorter than that for the final resolution, while maintaining overall complexity comparable to the conventional one-shot approach. Our experiments further illustrate how the layering feature increases the likelihood of meeting deadlines and enables adaptability and transparency in massive, large-scale computations.
- [1291] arXiv:2402.07233 (cross-list from cs.CL) [ pdf , ps , other ]
-
Title: TransGPT: Multi-modal Generative Pre-trained Transformer for TransportationSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI)
Abstract: Natural language processing (NLP) is a key component of intelligent transportation systems (ITS), but it faces many challenges in the transportation domain, such as domain-specific knowledge and data, and multi-modal inputs and outputs. This paper presents TransGPT, a novel (multi-modal) large language model for the transportation domain, which consists of two independent variants: TransGPT-SM for single-modal data and TransGPT-MM for multi-modal data. TransGPT-SM is finetuned on a single-modal Transportation dataset (STD) that contains textual data from various sources in the transportation domain. TransGPT-MM is finetuned on a multi-modal Transportation dataset (MTD) that we manually collected from three areas of the transportation domain: driving tests, traffic signs, and landmarks. We evaluate TransGPT on several benchmark datasets for different tasks in the transportation domain, and show that it outperforms baseline models on most tasks. We also showcase the potential applications of TransGPT for traffic analysis and modeling, such as generating synthetic traffic scenarios, explaining traffic phenomena, answering traffic-related questions, providing traffic recommendations, and generating traffic reports. This work advances the state-of-the-art of NLP in the transportation domain and provides a useful tool for ITS researchers and practitioners.
- [1292] arXiv:2402.07244 (cross-list from cs.NE) [ pdf , ps , other ]
-
Title: SAIS: A Novel Bio-Inspired Artificial Immune System Based on Symbiotic ParadigmSubjects: Neural and Evolutionary Computing (cs.NE) ; Artificial Intelligence (cs.AI)
Abstract: We propose a novel type of Artificial Immune System (AIS): Symbiotic Artificial Immune Systems (SAIS), drawing inspiration from symbiotic relationships in biology. SAIS parallels the three key stages (i.e., mutualism, commensalism and parasitism) of population updating from the Symbiotic Organisms Search (SOS) algorithm. This parallel approach effectively addresses the challenges of large population size and enhances population diversity in AIS, which traditional AIS and SOS struggle to resolve efficiently. We conducted a series of experiments, which demonstrated that our SAIS achieved comparable performance to the state-of-the-art approach SOS and outperformed other popular AIS approaches and evolutionary algorithms across 26 benchmark problems. Furthermore, we investigated the problem of parameter selection and found that SAIS performs better in handling larger population sizes while requiring fewer generations. Finally, we believe SAIS, as a novel bio-inspired and immune-inspired algorithm, paves the way for innovation in bio-inspired computing with the symbiotic paradigm.
- [1293] arXiv:2402.07250 (cross-list from cs.LG) [ pdf , ps , html , other ]
-
Title: DIMON: Learning Solution Operators of Partial Differential Equations on a Diffeomorphic Family of DomainsSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Computational Engineering, Finance, and Science (cs.CE)
Abstract: The solution of a PDE over varying initial/boundary conditions on multiple domains is needed in a wide variety of applications, but it is computationally expensive if the solution is computed de novo whenever the initial/boundary conditions of the domain change. We introduce a general operator learning framework, called DIffeomorphic Mapping Operator learNing (DIMON) to learn approximate PDE solutions over a family of domains $\{\Omega_{\theta}}_\theta$, that learns the map from initial/boundary conditions and domain $\Omega_\theta$ to the solution of the PDE, or to specified functionals thereof. DIMON is based on transporting a given problem (initial/boundary conditions and domain $\Omega_{\theta}$) to a problem on a reference domain $\Omega_{0}$, where training data from multiple problems is used to learn the map to the solution on $\Omega_{0}$, which is then re-mapped to the original domain $\Omega_{\theta}$. We consider several problems to demonstrate the performance of the framework in learning both static and time-dependent PDEs on non-rigid geometries; these include solving the Laplace equation, reaction-diffusion equations, and a multiscale PDE that characterizes the electrical propagation on the left ventricle. This work paves the way toward the fast prediction of PDE solutions on a family of domains and the application of neural operators in engineering and precision medicine.
- [1294] arXiv:2402.07268 (cross-list from q-bio.GN) [ pdf , ps , html , other ]
-
Title: Highly Accurate Disease Diagnosis and Highly Reproducible Biomarker Identification with PathFormerZehao Dong , Qihang Zhao , Philip R.O. Payne , Michael A Province , Carlos Cruchaga , Muhan Zhang , Tianyu Zhao , Yixin Chen , Fuhai LiSubjects: Genomics (q-bio.GN) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: Biomarker identification is critical for precise disease diagnosis and understanding disease pathogenesis in omics data analysis, like using fold change and regression analysis. Graph neural networks (GNNs) have been the dominant deep learning model for analyzing graph-structured data. However, we found two major limitations of existing GNNs in omics data analysis, i.e., limited-prediction (diagnosis) accuracy and limited-reproducible biomarker identification capacity across multiple datasets. The root of the challenges is the unique graph structure of biological signaling pathways, which consists of a large number of targets and intensive and complex signaling interactions among these targets. To resolve these two challenges, in this study, we presented a novel GNN model architecture, named PathFormer, which systematically integrate signaling network, priori knowledge and omics data to rank biomarkers and predict disease diagnosis. In the comparison results, PathFormer outperformed existing GNN models significantly in terms of highly accurate prediction capability ( 30% accuracy improvement in disease diagnosis compared with existing GNN models) and high reproducibility of biomarker ranking across different datasets. The improvement was confirmed using two independent Alzheimer's Disease (AD) and cancer transcriptomic datasets. The PathFormer model can be directly applied to other omics data analysis studies.
- [1295] arXiv:2402.07282 (cross-list from cs.CL) [ pdf , ps , other ]
-
Title: How do Large Language Models Navigate Conflicts between Honesty and Helpfulness?Subjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: In day-to-day communication, people often approximate the truth - for example, rounding the time or omitting details - in order to be maximally helpful to the listener. How do large language models (LLMs) handle such nuanced trade-offs? To address this question, we use psychological models and experiments designed to characterize human behavior to analyze LLMs. We test a range of LLMs and explore how optimization for human preferences or inference-time reasoning affects these trade-offs. We find that reinforcement learning from human feedback improves both honesty and helpfulness, while chain-of-thought prompting skews LLMs towards helpfulness over honesty. Finally, GPT-4 Turbo demonstrates human-like response patterns including sensitivity to the conversational framing and listener's decision context. Our findings reveal the conversational values internalized by LLMs and suggest that even these abstract values can, to a degree, be steered by zero-shot prompting.
- [1296] arXiv:2402.07283 (cross-list from cs.LG) [ pdf , ps , other ]
-
Title: Power Transformer Fault Prediction Based on Knowledge GraphsSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
Abstract: In this paper, we address the challenge of learning with limited fault data for power transformers. Traditional operation and maintenance tools lack effective predictive capabilities for potential faults. The scarcity of extensive fault data makes it difficult to apply machine learning techniques effectively. To solve this problem, we propose a novel approach that leverages the knowledge graph (KG) technology in combination with gradient boosting decision trees (GBDT). This method is designed to efficiently learn from a small set of high-dimensional data, integrating various factors influencing transformer faults and historical operational data. Our approach enables accurate safe state assessments and fault analyses of power transformers despite the limited fault characteristic data. Experimental results demonstrate that this method outperforms other learning approaches in prediction accuracy, such as artificial neural networks (ANN) and logistic regression (LR). Furthermore, it offers significant improvements in progressiveness, practicality, and potential for widespread application.
- [1297] arXiv:2402.07294 (cross-list from cs.SE) [ pdf , ps , other ]
-
Title: On the Effectiveness of Machine Learning-based Call Graph Pruning: An Empirical StudyComments: Accepted at the technical track of MSR'24Subjects: Software Engineering (cs.SE) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Programming Languages (cs.PL)
Abstract: Static call graph (CG) construction often over-approximates call relations, leading to sound, but imprecise results. Recent research has explored machine learning (ML)-based CG pruning as a means to enhance precision by eliminating false edges. However, current methods suffer from a limited evaluation dataset, imbalanced training data, and reduced recall, which affects practical downstream analyses. Prior results were also not compared with advanced static CG construction techniques yet. This study tackles these issues. We introduce the NYXCorpus, a dataset of real-world Java programs with high test coverage and we collect traces from test executions and build a ground truth of dynamic CGs. We leverage these CGs to explore conservative pruning strategies during the training and inference of ML-based CG pruners. We conduct a comparative analysis of static CGs generated using zero control flow analysis (0-CFA) and those produced by a context-sensitive 1-CFA algorithm, evaluating both with and without pruning. We find that CG pruning is a difficult task for real-world Java projects and substantial improvements in the CG precision (+25%) meet reduced recall (-9%). However, our experiments show promising results: even when we favor recall over precision by using an F2 metric in our experiments, we can show that pruned CGs have comparable quality to a context-sensitive 1-CFA analysis while being computationally less demanding. Resulting CGs are much smaller (69%), and substantially faster (3.5x speed-up), with virtually unchanged results in our downstream analysis.
- [1298] arXiv:2402.07295 (cross-list from cs.LG) [ pdf , ps , other ]
-
Title: Training Heterogeneous Client Models using Knowledge Distillation in Serverless Federated LearningComments: ACM/SIGAPP Symposium on Applied Computing 2024Subjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Distributed, Parallel, and Cluster Computing (cs.DC)
Abstract: Federated Learning (FL) is an emerging machine learning paradigm that enables the collaborative training of a shared global model across distributed clients while keeping the data decentralized. Recent works on designing systems for efficient FL have shown that utilizing serverless computing technologies, particularly Function-as-a-Service (FaaS) for FL, can enhance resource efficiency, reduce training costs, and alleviate the complex infrastructure management burden on data holders. However, existing serverless FL systems implicitly assume a uniform global model architecture across all participating clients during training. This assumption fails to address fundamental challenges in practical FL due to the resource and statistical data heterogeneity among FL clients. To address these challenges and enable heterogeneous client models in serverless FL, we utilize Knowledge Distillation (KD) in this paper. Towards this, we propose novel optimized serverless workflows for two popular conventional federated KD techniques, i.e., FedMD and FedDF. We implement these workflows by introducing several extensions to an open-source serverless FL system called FedLess. Moreover, we comprehensively evaluate the two strategies on multiple datasets across varying levels of client data heterogeneity using heterogeneous client models with respect to accuracy, fine-grained training times, and costs. Results from our experiments demonstrate that serverless FedDF is more robust to extreme non-IID data distributions, is faster, and leads to lower costs than serverless FedMD. In addition, compared to the original implementation, our optimizations for particular steps in FedMD and FedDF lead to an average speedup of 3.5x and 1.76x across all datasets.
- [1299] arXiv:2402.07319 (cross-list from cs.LG) [ pdf , ps , other ]
-
Title: ODIN: Disentangled Reward Mitigates Hacking in RLHFLichang Chen , Chen Zhu , Davit Soselia , Jiuhai Chen , Tianyi Zhou , Tom Goldstein , Heng Huang , Mohammad Shoeybi , Bryan CatanzaroSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
Abstract: In this work, we study the issue of reward hacking on the response length, a challenge emerging in Reinforcement Learning from Human Feedback (RLHF) on LLMs. A well-formatted, verbose but less helpful response from the LLMs can often deceive LLMs or even human evaluators to achieve high scores. The same issue also holds for some reward models in RL. To address the challenges in both training and evaluation, we establish a more reliable evaluation protocol for comparing different training configurations, which inspects the trade-off between LLM evaluation score and response length obtained by varying training hyperparameters. Based on this evaluation, we conduct large-scale studies, where the results shed insights into the efficacy of hyperparameters and tricks used in RL on mitigating length bias. We further propose to improve the reward model by jointly training two linear heads on shared feature representations to predict the rewards, one trained to correlate with length, and the other trained to decorrelate with length and therefore focus more on the actual content. We then discard the length head in RL to prevent reward hacking on length. Experiments demonstrate that our approach almost eliminates the reward correlation with length, and improves the obtained policy by a significant margin.
- [1300] arXiv:2402.07320 (cross-list from cs.CV) [ pdf , ps , other ]
-
Title: Towards Explainable, Safe Autonomous Driving with Language Embeddings for Novelty Identification and Active Learning: Framework and Experimental Analysis with Real-World Data SetsSubjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: This research explores the integration of language embeddings for active learning in autonomous driving datasets, with a focus on novelty detection. Novelty arises from unexpected scenarios that autonomous vehicles struggle to navigate, necessitating higher-level reasoning abilities. Our proposed method employs language-based representations to identify novel scenes, emphasizing the dual purpose of safety takeover responses and active learning. The research presents a clustering experiment using Contrastive Language-Image Pretrained (CLIP) embeddings to organize datasets and detect novelties. We find that the proposed algorithm effectively isolates novel scenes from a collection of subsets derived from two real-world driving datasets, one vehicle-mounted and one infrastructure-mounted. From the generated clusters, we further present methods for generating textual explanations of elements which differentiate scenes classified as novel from other scenes in the data pool, presenting qualitative examples from the clustered results. Our results demonstrate the effectiveness of language-driven embeddings in identifying novel elements and generating explanations of data, and we further discuss potential applications in safe takeovers, data curation, and multi-task active learning.
- [1301] arXiv:2402.07342 (cross-list from cs.HC) [ pdf , ps , other ]
-
Title: Imagining a Future of Designing with AI: Dynamic Grounding, Constructive Negotiation, and Sustainable MotivationComments: 12 pages, 4 figuresSubjects: Human-Computer Interaction (cs.HC) ; Artificial Intelligence (cs.AI)
Abstract: We ideate a future design workflow that involves AI technology. Drawing from activity and communication theory, we attempt to isolate the new value large AI models can provide design compared to past technologies. We arrive at three affordances -- dynamic grounding, constructive negotiation, and sustainable motivation -- that summarize latent qualities of natural language-enabled foundation models that, if explicitly designed for, can support the process of design. Through design fiction, we then imagine a future interface as a diegetic prototype, the story of Squirrel Game, that demonstrates each of our three affordances in a realistic usage scenario. Our design process, terminology, and diagrams aim to contribute to future discussions about the relative affordances of AI technology with regard to collaborating with human designers.
- [1302] arXiv:2402.07344 (cross-list from cs.LG) [ pdf , ps , other ]
-
Title: Measurement Scheduling for ICU Patients with Offline Reinforcement LearningComments: Extended Abstract presented at Machine Learning for Health (ML4H) symposium 2023, December 10th, 2023, New Orleans, United States, 11 pagesSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Abstract: Scheduling laboratory tests for ICU patients presents a significant challenge. Studies show that 20-40% of lab tests ordered in the ICU are redundant and could be eliminated without compromising patient safety. Prior work has leveraged offline reinforcement learning (Offline-RL) to find optimal policies for ordering lab tests based on patient information. However, new ICU patient datasets have since been released, and various advancements have been made in Offline-RL methods. In this study, we first introduce a preprocessing pipeline for the newly-released MIMIC-IV dataset geared toward time-series tasks. We then explore the efficacy of state-of-the-art Offline-RL methods in identifying better policies for ICU patient lab test scheduling. Besides assessing methodological performance, we also discuss the overall suitability and practicality of using Offline-RL frameworks for scheduling laboratory tests in ICU settings.
- [1303] arXiv:2402.07347 (cross-list from cs.LG) [ pdf , ps , other ]
-
Title: Accuracy of TextFooler black box adversarial attacks on 01 loss sign activation neural network ensembleSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Cryptography and Security (cs.CR)
Abstract: Recent work has shown the defense of 01 loss sign activation neural networks against image classification adversarial attacks. A public challenge to attack the models on CIFAR10 dataset remains undefeated. We ask the following question in this study: are 01 loss sign activation neural networks hard to deceive with a popular black box text adversarial attack program called TextFooler? We study this question on four popular text classification datasets: IMDB reviews, Yelp reviews, MR sentiment classification, and AG news classification. We find that our 01 loss sign activation network is much harder to attack with TextFooler compared to sigmoid activation cross entropy and binary neural networks. We also study a 01 loss sign activation convolutional neural network with a novel global pooling step specific to sign activation networks. With this new variation we see a significant gain in adversarial accuracy rendering TextFooler practically useless against it. We make our code freely available at \url{ this https URL } and \url{ this https URL }. Our work here suggests that 01 loss sign activation networks could be further developed to create fool proof models against text adversarial attacks.
- [1304] arXiv:2402.07366 (cross-list from cs.LG) [ pdf , ps , other ]
-
Title: Bayesian Federated Learning Via Expectation Maximization and Turbo Deep Approximate Message PassingSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Abstract: Federated learning (FL) is a machine learning paradigm where the clients possess decentralized training data and the central server handles aggregation and scheduling. Typically, FL algorithms involve clients training their local models using stochastic gradient descent (SGD), which carries drawbacks such as slow convergence and being prone to getting stuck in suboptimal solutions. In this work, we propose a message passing based Bayesian federated learning (BFL) framework to avoid these drawbacks.Specifically, we formulate the problem of deep neural network (DNN) learning and compression and as a sparse Bayesian inference problem, in which group sparse prior is employed to achieve structured model compression. Then, we propose an efficient BFL algorithm called EMTDAMP, where expectation maximization (EM) and turbo deep approximate message passing (TDAMP) are combined to achieve distributed learning and compression. The central server aggregates local posterior distributions to update global posterior distributions and update hyperparameters based on EM to accelerate convergence. The clients perform TDAMP to achieve efficient approximate message passing over DNN with joint prior distribution. We detail the application of EMTDAMP to Boston housing price prediction and handwriting recognition, and present extensive numerical results to demonstrate the advantages of EMTDAMP.
- [1305] arXiv:2402.07368 (cross-list from cs.LG) [ pdf , ps , html , other ]
-
Title: Assessing Generalization for Subpopulation Representative Modeling via In-Context LearningComments: Accepted to PERSONALIZE workshop at EACL 2024Subjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Computers and Society (cs.CY)
Abstract: This study evaluates the ability of Large Language Model (LLM)-based Subpopulation Representative Models (SRMs) to generalize from empirical data, utilizing in-context learning with data from the 2016 and 2020 American National Election Studies. We explore generalization across response variables and demographic subgroups. While conditioning with empirical data improves performance on the whole, the benefit of in-context learning varies considerably across demographics, sometimes hurting performance for one demographic while helping performance for others. The inequitable benefits of in-context learning for SRM present a challenge for practitioners implementing SRMs, and for decision-makers who might come to rely on them. Our work highlights a need for fine-grained benchmarks captured from diverse subpopulations that test not only fidelity but generalization.
- [1306] arXiv:2402.07370 (cross-list from cs.CV) [ pdf , ps , other ]
-
Title: SelfSwapper: Self-Supervised Face Swapping via Shape Agnostic Masked AutoEncoderSubjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI)
Abstract: Face swapping has gained significant attention for its varied applications. The majority of previous face swapping approaches have relied on the seesaw game training scheme, which often leads to the instability of the model training and results in undesired samples with blended identities due to the target identity leakage problem. This paper introduces the Shape Agnostic Masked AutoEncoder (SAMAE) training scheme, a novel self-supervised approach designed to enhance face swapping model training. Our training scheme addresses the limitations of traditional training methods by circumventing the conventional seesaw game and introducing clear ground truth through its self-reconstruction training regime. It effectively mitigates identity leakage by masking facial regions of the input images and utilizing learned disentangled identity and non-identity features. Additionally, we tackle the shape misalignment problem with new techniques including perforation confusion and random mesh scaling, and establishes a new state-of-the-art, surpassing other baseline methods, preserving both identity and non-identity attributes, without sacrificing on either aspect.
- [1307] arXiv:2402.07384 (cross-list from cs.CV) [ pdf , ps , other ]
-
Title: Exploring Perceptual Limitation of Multimodal Large Language ModelsComments: 14 pages, 14 figures, 3 tablesSubjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: Multimodal Large Language Models (MLLMs) have recently shown remarkable perceptual capability in answering visual questions, however, little is known about the limits of their perception. In particular, while prior works have provided anecdotal evidence of MLLMs' sensitivity to object size, this phenomenon and its underlying causes have not been explored comprehensively. In this work, we quantitatively study the perception of small visual objects in several state-of-the-art MLLMs and reveal a pervasive limitation in answering questions about small objects in images. Next, we identify four independent factors that can contribute to this limitation -- object quality, size, distractors, and location -- and conduct controlled intervention studies to measure the effect of each factor on MLLMs' perception. In particular, we find that lower object quality and smaller object size can both independently reduce MLLMs' ability to answer visual questions. More surprisingly, we find that the location of the object in the image and the presence of visual distractors can also significantly reduce MLLMs' question answering accuracy. Our study provides a better understanding of the perceptual limitation of MLLMs and contributes new evaluation protocols for analyzing the perception of future MLLMs. To facilitate further investigations, we release our code and data.
- [1308] arXiv:2402.07393 (cross-list from cs.ET) [ pdf , ps , html , other ]
-
Title: TeMPO: Efficient Time-Multiplexed Dynamic Photonic Tensor Core for Edge AI with Compact Slow-Light Electro-Optic ModulatorMeng Zhang , Dennis Yin , Nicholas Gangi , Amir Begović , Alexander Chen , Zhaoran Rena Huang , Jiaqi GuComments: 17 pages, 19 figuresSubjects: Emerging Technologies (cs.ET) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: Electronic-photonic computing systems offer immense potential in energy-efficient artificial intelligence (AI) acceleration tasks due to the superior computing speed and efficiency of optics, especially for real-time, low-energy deep neural network (DNN) inference tasks on resource-restricted edge platforms. However, current optical neural accelerators based on foundry-available devices and conventional system architecture still encounter a performance gap compared to highly customized electronic counterparts. To bridge the performance gap due to lack of domain specialization, we present a time-multiplexed dynamic photonic tensor accelerator, dubbed TeMPO, with cross-layer device/circuit/architecture customization. At the device level, we present foundry-compatible, customized photonic devices, including a slow-light electro-optic modulator with experimental demonstration, optical splitters, and phase shifters that significantly reduce the footprint and power in input encoding and dot-product calculation. At the circuit level, partial products are hierarchically accumulated via parallel photocurrent aggregation, lightweight capacitive temporal integration, and sequential digital summation, considerably relieving the analog-to-digital conversion bottleneck. We also employ a multi-tile, multi-core architecture to maximize hardware sharing for higher efficiency. Across diverse edge AI workloads, TeMPO delivers digital-comparable task accuracy with superior quantization/noise tolerance. We achieve a 368.6 TOPS peak performance, 22.3 TOPS/W energy efficiency, and 1.2 TOPS/mm$^2$ compute density, pushing the Pareto frontier in edge AI hardware. This work signifies the power of cross-layer co-design and domain-specific customization, paving the way for future electronic-photonic accelerators with even greater performance and efficiency.
- [1309] arXiv:2402.07402 (cross-list from cs.MM) [ pdf , ps , other ]
-
Title: BDIQA: A New Dataset for Video Question Answering to Explore Cognitive Reasoning through Theory of MindSubjects: Multimedia (cs.MM) ; Artificial Intelligence (cs.AI)
Abstract: As a foundational component of cognitive intelligence, theory of mind (ToM) can make AI more closely resemble human thought processes, thereby enhancing their interaction and collaboration with human. In particular, it can significantly improve a model's comprehension of videos in complex scenes. However, current video question answer (VideoQA) datasets focus on studying causal reasoning within events few of them genuinely incorporating human ToM. Consequently, there is a lack of development in ToM reasoning tasks within the area of VideoQA. This paper presents BDIQA, the first benchmark to explore the cognitive reasoning capabilities of VideoQA models in the context of ToM. BDIQA is inspired by the cognitive development of children's ToM and addresses the current deficiencies in machine ToM within datasets and tasks. Specifically, it offers tasks at two difficulty levels, assessing Belief, Desire and Intention (BDI) reasoning in both simple and complex scenarios. We conduct evaluations on several mainstream methods of VideoQA and diagnose their capabilities with zero shot, few shot and supervised learning. We find that the performance of pre-trained models on cognitive reasoning tasks remains unsatisfactory. To counter this challenge, we undertake thorough analysis and experimentation, ultimately presenting two guidelines to enhance cognitive reasoning derived from ablation analysis.
- [1310] arXiv:2402.07408 (cross-list from cs.CR) [ pdf , ps , other ]
-
Title: Large Language Models are Few-shot Generators: Proposing Hybrid Prompt Algorithm To Generate Webshell Escape SamplesComments: 13 pages, 16 figuresSubjects: Cryptography and Security (cs.CR) ; Artificial Intelligence (cs.AI)
Abstract: The frequent occurrence of cyber-attacks has made webshell attacks and defense gradually become a research hotspot in the field of network security. However, the lack of publicly available benchmark datasets and the over-reliance on manually defined rules for webshell escape sample generation have slowed down the progress of research related to webshell escape sample generation strategies and artificial intelligence-based webshell detection algorithms. To address the drawbacks of weak webshell sample escape capabilities, the lack of webshell datasets with complex malicious features, and to promote the development of webshell detection technology, we propose the Hybrid Prompt algorithm for webshell escape sample generation with the help of large language models. As a prompt algorithm specifically developed for webshell sample generation, the Hybrid Prompt algorithm not only combines various prompt ideas including Chain of Thought, Tree of Thought, but also incorporates various components such as webshell hierarchical module and few-shot example to facilitate the LLM in learning and reasoning webshell escape strategies. Experimental results show that the Hybrid Prompt algorithm can work with multiple LLMs with excellent code reasoning ability to generate high-quality webshell samples with high Escape Rate (88.61% with GPT-4 model on VIRUSTOTAL detection engine) and Survival Rate (54.98% with GPT-4 model).
- [1311] arXiv:2402.07412 (cross-list from cs.LG) [ pdf , ps , other ]
-
Title: Auxiliary Reward Generation with Transition Distance Representation LearningSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Abstract: Reinforcement learning (RL) has shown its strength in challenging sequential decision-making problems. The reward function in RL is crucial to the learning performance, as it serves as a measure of the task completion degree. In real-world problems, the rewards are predominantly human-designed, which requires laborious tuning, and is easily affected by human cognitive biases. To achieve automatic auxiliary reward generation, we propose a novel representation learning approach that can measure the ``transition distance'' between states. Building upon these representations, we introduce an auxiliary reward generation technique for both single-task and skill-chaining scenarios without the need for human knowledge. The proposed approach is evaluated in a wide range of manipulation tasks. The experiment results demonstrate the effectiveness of measuring the transition distance between states and the induced improvement by auxiliary rewards, which not only promotes better learning efficiency but also increases convergent stability.
- [1312] arXiv:2402.07415 (cross-list from cs.LG) [ pdf , ps , html , other ]
-
Title: Context-aware Multi-Model Object Detection for Diversely Heterogeneous Compute SystemsSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO)
Abstract: In recent years, deep neural networks (DNNs) have gained widespread adoption for continuous mobile object detection (OD) tasks, particularly in autonomous systems. However, a prevalent issue in their deployment is the one-size-fits-all approach, where a single DNN is used, resulting in inefficient utilization of computational resources. This inefficiency is particularly detrimental in energy-constrained systems, as it degrades overall system efficiency. We identify that, the contextual information embedded in the input data stream (e.g. the frames in the camera feed that the OD models are run on) could be exploited to allow a more efficient multi-model-based OD process. In this paper, we propose SHIFT which continuously selects from a variety of DNN-based OD models depending on the dynamically changing contextual information and computational constraints. During this selection, SHIFT uniquely considers multi-accelerator execution to better optimize the energy-efficiency while satisfying the latency constraints. Our proposed methodology results in improvements of up to 7.5x in energy usage and 2.8x in latency compared to state-of-the-art GPU-based single model OD approaches.
- [1313] arXiv:2402.07419 (cross-list from cs.LG) [ pdf , ps , html , other ]
-
Title: Conditional Generative Models are Sufficient to Sample from Any Causal Effect EstimandSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Methodology (stat.ME); Machine Learning (stat.ML)
Abstract: Causal inference from observational data has recently found many applications in machine learning. While sound and complete algorithms exist to compute causal effects, many of these algorithms require explicit access to conditional likelihoods over the observational distribution, which is difficult to estimate in the high-dimensional regime, such as with images. To alleviate this issue, researchers have approached the problem by simulating causal relations with neural models and obtained impressive results. However, none of these existing approaches can be applied to generic scenarios such as causal graphs on image data with latent confounders, or obtain conditional interventional samples. In this paper, we show that any identifiable causal effect given an arbitrary causal graph can be computed through push-forward computations of conditional generative models. Based on this result, we devise a diffusion-based approach to sample from any (conditional) interventional distribution on image data. To showcase our algorithm's performance, we conduct experiments on a Colored MNIST dataset having both the treatment ($X$) and the target variables ($Y$) as images and obtain interventional samples from $P(y|do(x))$. As an application of our algorithm, we evaluate two large conditional generative models that are pre-trained on the CelebA dataset by analyzing the strength of spurious correlations and the level of disentanglement they achieve.
- [1314] arXiv:2402.07435 (cross-list from q-fin.ST) [ pdf , ps , other ]
-
Title: Analyzing Currency Fluctuations: A Comparative Study of GARCH, EWMA, and IV Models for GBP/USD and EUR/GBP PairsSubjects: Statistical Finance (q-fin.ST) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: In this study, we examine the fluctuation in the value of the Great Britain Pound (GBP). We focus particularly on its relationship with the United States Dollar (USD) and the Euro (EUR) currency pairs. Utilizing data from June 15, 2018, to June 15, 2023, we apply various mathematical models to assess their effectiveness in predicting the 20-day variation in the pairs' daily returns. Our analysis involves the implementation of Exponentially Weighted Moving Average (EWMA), Generalized Autoregressive Conditional Heteroskedasticity (GARCH) models, and Implied Volatility (IV) models. To evaluate their performance, we compare the accuracy of their predictions using Root Mean Square Error (RMSE) and Mean Absolute Error (MAE) metrics. We delve into the intricacies of GARCH models, examining their statistical characteristics when applied to the provided dataset. Our findings suggest the existence of asymmetric returns in the EUR/GBP pair, while such evidence is inconclusive for the GBP/USD pair. Additionally, we observe that GARCH-type models better fit the data when assuming residuals follow a standard t-distribution rather than a standard normal distribution. Furthermore, we investigate the efficacy of different forecasting techniques within GARCH-type models. Comparing rolling window forecasts to expanding window forecasts, we find no definitive superiority in either approach across the tested scenarios. Our experiments reveal that for the GBP/USD pair, the most accurate volatility forecasts stem from the utilization of GARCH models employing a rolling window methodology. Conversely, for the EUR/GBP pair, optimal forecasts are derived from GARCH models and Ordinary Least Squares (OLS) models incorporating the annualized implied volatility of the exchange rate as an independent variable.
- [1315] arXiv:2402.07437 (cross-list from cs.GT) [ pdf , ps , other ]
-
Title: Learning Optimal Tax Design in Nonatomic Congestion GamesComments: 19 pagesSubjects: Computer Science and Game Theory (cs.GT) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Multiagent Systems (cs.MA)
Abstract: We study how to learn the optimal tax design to maximize the efficiency in nonatomic congestion games. It is known that self-interested behavior among the players can damage the system's efficiency. Tax mechanisms is a common method to alleviate this issue and induce socially optimal behavior. In this work, we take the initial step for learning the optimal tax that can minimize the social cost with \emph{equilibrium feedback}, i.e., the tax designer can only observe the equilibrium state under the enforced tax. Existing algorithms are not applicable due to the exponentially large tax function space, nonexistence of the gradient, and nonconvexity of the objective. To tackle these challenges, our algorithm leverages several novel components: (1) piece-wise linear tax to approximate the optimal tax; (2) an extra linear term to guarantee a strongly convex potential function; (3) efficient subroutine to find the ``boundary'' tax. The algorithm can find an $\epsilon$-optimal tax with $O(\beta F^2/\epsilon)$ sample complexity, where $\beta$ is the smoothness of the cost function and $F$ is the number of facilities.
- [1316] arXiv:2402.07448 (cross-list from cs.CL) [ pdf , ps , other ]
-
Title: AraSpider: Democratizing Arabic-to-SQLComments: 11 pages, 4 figuresSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI); Databases (cs.DB); Information Retrieval (cs.IR); Machine Learning (cs.LG)
Abstract: This study presents AraSpider, the first Arabic version of the Spider dataset, aimed at improving natural language processing (NLP) in the Arabic-speaking community. Four multilingual translation models were tested for their effectiveness in translating English to Arabic. Additionally, two models were assessed for their ability to generate SQL queries from Arabic text. The results showed that using back translation significantly improved the performance of both ChatGPT 3.5 and SQLCoder models, which are considered top performers on the Spider dataset. Notably, ChatGPT 3.5 demonstrated high-quality translation, while SQLCoder excelled in text-to-SQL tasks. The study underscores the importance of incorporating contextual schema and employing back translation strategies to enhance model performance in Arabic NLP tasks. Moreover, the provision of detailed methodologies for reproducibility and translation of the dataset into other languages highlights the research's commitment to promoting transparency and collaborative knowledge sharing in the field. Overall, these contributions advance NLP research, empower Arabic-speaking researchers, and enrich the global discourse on language comprehension and database interrogation.
- [1317] arXiv:2402.07452 (cross-list from cs.CV) [ pdf , ps , html , other ]
-
Title: TriAug: Out-of-Distribution Detection for Imbalanced Breast Lesion in UltrasoundSubjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: Different diseases, such as histological subtypes of breast lesions, have severely varying incidence rates. Even trained with substantial amount of in-distribution (ID) data, models often encounter out-of-distribution (OOD) samples belonging to unseen classes in clinical reality. To address this, we propose a novel framework built upon a long-tailed OOD detection task for breast ultrasound images. It is equipped with a triplet state augmentation (TriAug) which improves ID classification accuracy while maintaining a promising OOD detection performance. Meanwhile, we designed a balanced sphere loss to handle the class imbalanced problem. Experimental results show that the model outperforms state-of-art OOD approaches both in ID classification (F1-score=42.12%) and OOD detection (AUROC=78.06%).
- [1318] arXiv:2402.07465 (cross-list from cs.LG) [ pdf , ps , other ]
-
Title: Score-Based Physics-Informed Neural Networks for High-Dimensional Fokker-Planck EquationsComments: 22 pagesSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Dynamical Systems (math.DS); Numerical Analysis (math.NA); Machine Learning (stat.ML)
Abstract: The Fokker-Planck (FP) equation is a foundational PDE in stochastic processes. However, curse of dimensionality (CoD) poses challenge when dealing with high-dimensional FP PDEs. Although Monte Carlo and vanilla Physics-Informed Neural Networks (PINNs) have shown the potential to tackle CoD, both methods exhibit numerical errors in high dimensions when dealing with the probability density function (PDF) associated with Brownian motion. The point-wise PDF values tend to decrease exponentially as dimension increases, surpassing the precision of numerical simulations and resulting in substantial errors. Moreover, due to its massive sampling, Monte Carlo fails to offer fast sampling. Modeling the logarithm likelihood (LL) via vanilla PINNs transforms the FP equation into a difficult HJB equation, whose error grows rapidly with dimension. To this end, we propose a novel approach utilizing a score-based solver to fit the score function in SDEs. The score function, defined as the gradient of the LL, plays a fundamental role in inferring LL and PDF and enables fast SDE sampling. Three fitting methods, Score Matching (SM), Sliced SM (SSM), and Score-PINN, are introduced. The proposed score-based SDE solver operates in two stages: first, employing SM, SSM, or Score-PINN to acquire the score; and second, solving the LL via an ODE using the obtained score. Comparative evaluations across these methods showcase varying trade-offs. The proposed method is evaluated across diverse SDEs, including anisotropic OU processes, geometric Brownian, and Brownian with varying eigenspace. We also test various distributions, including Gaussian, Log-normal, Laplace, and Cauchy. The numerical results demonstrate the score-based SDE solver's stability, speed, and performance across different settings, solidifying its potential as a solution to CoD for high-dimensional FP equations.
- [1319] arXiv:2402.07501 (cross-list from cs.LG) [ pdf , ps , other ]
-
Title: One Train for Two Tasks: An Encrypted Traffic Classification Framework Using Supervised Contrastive LearningComments: The code is available at this https URLSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Abstract: As network security receives widespread attention, encrypted traffic classification has become the current research focus. However, existing methods conduct traffic classification without sufficiently considering the common characteristics between data samples, leading to suboptimal performance. Moreover, they train the packet-level and flow-level classification tasks independently, which is redundant because the packet representations learned in the packet-level task can be exploited by the flow-level task. Therefore, in this paper, we propose an effective model named a Contrastive Learning Enhanced Temporal Fusion Encoder (CLE-TFE). In particular, we utilize supervised contrastive learning to enhance the packet-level and flow-level representations and perform graph data augmentation on the byte-level traffic graph so that the fine-grained semantic-invariant characteristics between bytes can be captured through contrastive learning. We also propose cross-level multi-task learning, which simultaneously accomplishes the packet-level and flow-level classification tasks in the same model with one training. Further experiments show that CLE-TFE achieves the best overall performance on the two tasks, while its computational overhead (i.e., floating point operations, FLOPs) is only about 1/14 of the pre-trained model (e.g., ET-BERT). We release the code at this https URL
- [1320] arXiv:2402.07513 (cross-list from cs.CL) [ pdf , ps , other ]
-
Title: The Balancing Act: Unmasking and Alleviating ASR Biases in PortugueseComments: EACL-2024 LT-EDI WorkshopSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI); Computers and Society (cs.CY)
Abstract: In the field of spoken language understanding, systems like Whisper and Multilingual Massive Speech (MMS) have shown state-of-the-art performances. This study is dedicated to a comprehensive exploration of the Whisper and MMS systems, with a focus on assessing biases in automatic speech recognition (ASR) inherent to casual conversation speech specific to the Portuguese language. Our investigation encompasses various categories, including gender, age, skin tone color, and geo-location. Alongside traditional ASR evaluation metrics such as Word Error Rate (WER), we have incorporated p-value statistical significance for gender bias analysis. Furthermore, we extensively examine the impact of data distribution and empirically show that oversampling techniques alleviate such stereotypical biases. This research represents a pioneering effort in quantifying biases in the Portuguese language context through the application of MMS and Whisper, contributing to a better understanding of ASR systems' performance in multilingual settings.
- [1321] arXiv:2402.07540 (cross-list from cs.HC) [ pdf , ps , html , other ]
-
Title: PKG API: A Tool for Personal Knowledge Graph ManagementNolwenn Bernard , Ivica Kostric , Weronika Łajewska , Krisztian Balog , Petra Galuščáková , Vinay Setty , Martin G. SkjævelandSubjects: Human-Computer Interaction (cs.HC) ; Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
Abstract: Personal knowledge graphs (PKGs) offer individuals a way to store and consolidate their fragmented personal data in a central place, improving service personalization while maintaining full user control. Despite their potential, practical PKG implementations with user-friendly interfaces remain scarce. This work addresses this gap by proposing a complete solution to represent, manage, and interface with PKGs. Our approach includes (1) a user-facing PKG Client, enabling end-users to administer their personal data easily via natural language statements, and (2) a service-oriented PKG API. To tackle the complexity of representing these statements within a PKG, we present an RDF-based PKG vocabulary that supports this, along with properties for access rights and provenance.
- [1322] arXiv:2402.07543 (cross-list from cs.CL) [ pdf , ps , other ]
-
Title: Show Me How It's Done: The Role of Explanations in Fine-Tuning Language ModelsSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: Our research demonstrates the significant benefits of using fine-tuning with explanations to enhance the performance of language models. Unlike prompting, which maintains the model's parameters, fine-tuning allows the model to learn and update its parameters during a training phase. In this study, we applied fine-tuning to various sized language models using data that contained explanations of the output rather than merely presenting the answers. We found that even smaller language models with as few as 60 million parameters benefited substantially from this approach. Interestingly, our results indicated that the detailed explanations were more beneficial to smaller models than larger ones, with the latter gaining nearly the same advantage from any form of explanation, irrespective of its length. Additionally, we demonstrate that the inclusion of explanations enables the models to solve tasks that they were not able to solve without explanations. Lastly, we argue that despite the challenging nature of adding explanations, samples that contain explanations not only reduce the volume of data required for training but also promote a more effective generalization by the model. In essence, our findings suggest that fine-tuning with explanations significantly bolsters the performance of large language models.
- [1323] arXiv:2402.07547 (cross-list from cs.MA) [ pdf , ps , other ]
-
Title: Ensuring trustworthy and ethical behaviour in intelligent logical agentsJournal-ref: Journal of Logic and Computation, Volume 32, Issue 2, March 2022, Pages 443-478Subjects: Multiagent Systems (cs.MA) ; Artificial Intelligence (cs.AI); Logic in Computer Science (cs.LO); Symbolic Computation (cs.SC)
Abstract: Autonomous Intelligent Agents are employed in many applications upon which the life and welfare of living beings and vital social functions may depend. Therefore, agents should be trustworthy. A priori certification techniques (i.e., techniques applied prior to system's deployment) can be useful, but are not sufficient for agents that evolve, and thus modify their epistemic and belief state, and for open Multi-Agent Systems, where heterogeneous agents can join or leave the system at any stage of its operation. In this paper, we propose/refine/extend dynamic (runtime) logic-based self-checking techniques, devised in order to be able to ensure agents' trustworthy and ethical behaviour.
- [1324] arXiv:2402.07562 (cross-list from cs.CR) [ pdf , ps , other ]
-
Title: Discovering Universal Semantic Triggers for Text-to-Image SynthesisComments: 9 pages, 5 figures. Work in progressSubjects: Cryptography and Security (cs.CR) ; Artificial Intelligence (cs.AI)
Abstract: Recently text-to-image models have gained widespread attention in the community due to their controllable and high-quality generation ability. However, the robustness of such models and their potential ethical issues have not been fully explored. In this paper, we introduce Universal Semantic Trigger, a meaningless token sequence that can be added at any location within the input text yet can induce generated images towards a preset semantic this http URL thoroughly investigate it, we propose Semantic Gradient-based Search (SGS) framework. SGS automatically discovers the potential universal semantic triggers based on the given semantic targets. Furthermore, we design evaluation metrics to comprehensively evaluate semantic shift of images caused by these triggers. And our empirical analyses reveal that the mainstream open-source text-to-image models are vulnerable to our triggers, which could pose significant ethical threats. Our work contributes to a further understanding of text-to-image synthesis and helps users to automatically auditing their models before deployment.
- [1325] arXiv:2402.07570 (cross-list from cs.LG) [ pdf , ps , other ]
-
Title: Only the Curve Shape Matters: Training Foundation Models for Zero-Shot Multivariate Time Series Forecasting through Next Curve Shape PredictionSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Abstract: We present General Time Transformer (GTT), an encoder-only style foundation model for zero-shot multivariate time series forecasting. GTT is pretrained on a large dataset of 200M high-quality time series samples spanning diverse domains. In our proposed framework, the task of multivariate time series forecasting is formulated as a channel-wise next curve shape prediction problem, where each time series sample is represented as a sequence of non-overlapping curve shapes with a unified numerical magnitude. GTT is trained to predict the next curve shape based on a window of past curve shapes in a channel-wise manner. Experimental results demonstrate that GTT exhibits superior zero-shot multivariate forecasting capabilities on unseen time series datasets, even surpassing state-of-the-art supervised baselines. Additionally, we investigate the impact of varying GTT model parameters and training dataset scales, observing that the scaling law also holds in the context of zero-shot multivariate time series forecasting.
- [1326] arXiv:2402.07610 (cross-list from cs.CL) [ pdf , ps , html , other ]
-
Title: Step-On-Feet Tuning: Scaling Self-Alignment of LLMs via BootstrappingHaoyu Wang , Guozheng Ma , Ziqiao Meng , Zeyu Qin , Li Shen , Zhong Zhang , Bingzhe Wu , Liu Liu , Yatao Bian , Tingyang Xu , Xueqian Wang , Peilin ZhaoSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI)
Abstract: Self-alignment is an effective way to reduce the cost of human annotation while ensuring promising model capability. However, most current methods complete the data collection and training steps in a single round, which may overlook the continuously improving ability of self-aligned models. This gives rise to a key query: What if we do multi-time bootstrapping self-alignment? Does this strategy enhance model performance or lead to rapid degradation? In this paper, our pioneering exploration delves into the impact of bootstrapping self-alignment on large language models. Our findings reveal that bootstrapping self-alignment markedly surpasses the single-round approach, by guaranteeing data diversity from in-context learning. To further exploit the capabilities of bootstrapping, we investigate and adjust the training order of data, which yields improved performance of the model. Drawing on these findings, we propose Step-On-Feet Tuning (SOFT) which leverages model's continuously enhanced few-shot ability to boost zero or one-shot performance. Based on easy-to-hard training recipe, we propose SOFT+ which further boost self-alignment's performance. Our experiments demonstrate the efficiency of SOFT (SOFT+) across various classification and generation tasks, highlighting the potential of bootstrapping self-alignment on continually enhancing model alignment performance.
- [1327] arXiv:2402.07616 (cross-list from cs.CL) [ pdf , ps , other ]
-
Title: Anchor-based Large Language ModelsComments: 16 pages. Work was done when Jianhui Pang and Fanghua Ye were interning at Tencent AI Lab. Longyue Wang is the corresponding authorSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI)
Abstract: Large language models (LLMs) predominantly employ decoder-only transformer architectures, necessitating the retention of keys/values information for historical tokens to provide contextual information and avoid redundant computation. However, the substantial size and parameter volume of these LLMs require massive GPU memory. This memory demand increases with the length of the input text, leading to an urgent need for more efficient methods of information storage and processing. This study introduces Anchor-based LLMs (AnLLMs), which utilize an innovative anchor-based self-attention network (AnSAN) and also an anchor-based inference strategy. This approach enables LLMs to compress sequence information into an anchor token, reducing the keys/values cache and enhancing inference efficiency. Experiments on question-answering benchmarks reveal that AnLLMs maintain similar accuracy levels while achieving up to 99% keys/values cache reduction and up to 3.5 times faster inference. Despite a minor compromise in accuracy, the substantial enhancements of AnLLMs employing the AnSAN technique in resource utilization and computational efficiency underscore their potential for practical LLM applications.
- [1328] arXiv:2402.07619 (cross-list from cs.SD) [ pdf , ps , other ]
-
Title: Developing a Multi-variate Prediction Model For COVID-19 From Crowd-sourced Respiratory Voice DataComments: arXiv admin note: text overlap with arXiv:2209.03727Subjects: Sound (cs.SD) ; Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
Abstract: COVID-19 has affected more than 223 countries worldwide and in the Post-COVID Era, there is a pressing need for non-invasive, low-cost, and highly scalable solutions to detect COVID-19. We develop a deep learning model to identify COVID-19 from voice recording data. The novelty of this work is in the development of deep learning models for COVID-19 identification from only voice recordings. We use the Cambridge COVID-19 Sound database which contains 893 speech samples, crowd-sourced from 4352 participants via a COVID-19 Sounds app. Voice features including Mel-spectrograms and Mel-frequency cepstral coefficients (MFCC) and CNN Encoder features are extracted. Based on the voice data, we develop deep learning classification models to detect COVID-19 cases. These models include Long Short-Term Memory (LSTM) and Convolutional Neural Network (CNN) and Hidden-Unit BERT (HuBERT). We compare their predictive power to baseline machine learning models. HuBERT achieves the highest accuracy of 86\% and the highest AUC of 0.93. The results achieved with the proposed models suggest promising results in COVID-19 diagnosis from voice recordings when compared to the results obtained from the state-of-the-art.
- [1329] arXiv:2402.07625 (cross-list from cs.CL) [ pdf , ps , html , other ]
-
Title: Autonomous Data Selection with Language Models for Mathematical TextsSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: To improve language models' proficiency in mathematical reasoning via continual pretraining, we introduce a novel strategy that leverages base language models for autonomous data selection. Departing from conventional supervised fine-tuning or trained classifiers with human-annotated data, our approach Autonomous Data Selection (AutoDS) utilizes meta-prompted language models as zero-shot verifiers to evaluate and select high-quality mathematical content autonomously. To demonstrate the efficacy of our method, we continuously pretrained a 7B-parameter language model on our curated dataset, achieving substantial improvements in downstream performance on the MATH, GSM8K, and BIG-Bench Hard (BBH) tasks with a token amount reduced by orders of magnitude compared to previous continual pretraining works. Our method showcases a 2 times increase in pretraining token efficiency compared to state-of-the-art baselines, underscoring the potential of our approach in enhancing models' mathematical reasoning capabilities. The AutoMathText dataset is available at this https URL . The code is available at this https URL .
- [1330] arXiv:2402.07639 (cross-list from cs.LG) [ pdf , ps , other ]
-
Title: Tighter Bounds on the Information Bottleneck with Application to Deep LearningComments: 10 pages, 5 figures, code included in github repoSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Information Theory (cs.IT)
Abstract: Deep Neural Nets (DNNs) learn latent representations induced by their downstream task, objective function, and other parameters. The quality of the learned representations impacts the DNN's generalization ability and the coherence of the emerging latent space. The Information Bottleneck (IB) provides a hypothetically optimal framework for data modeling, yet it is often intractable. Recent efforts combined DNNs with the IB by applying VAE-inspired variational methods to approximate bounds on mutual information, resulting in improved robustness to adversarial attacks. This work introduces a new and tighter variational bound for the IB, improving performance of previous IB-inspired DNNs. These advancements strengthen the case for the IB and its variational approximations as a data modeling framework, and provide a simple method to significantly enhance the adversarial robustness of classifier DNNs.
- [1331] arXiv:2402.07640 (cross-list from cs.MM) [ pdf , ps , html , other ]
-
Title: Synthesizing Sentiment-Controlled Feedback For Multimodal Text and Image DataSubjects: Multimedia (cs.MM) ; Artificial Intelligence (cs.AI)
Abstract: The ability to generate sentiment-controlled feedback in response to multimodal inputs, comprising both text and images, addresses a critical gap in human-computer interaction by enabling systems to provide empathetic, accurate, and engaging responses. This capability has profound applications in healthcare, marketing, and education. To this end, we construct a large-scale Controllable Multimodal Feedback Synthesis (CMFeed) dataset and propose a controllable feedback synthesis system. The proposed system includes an encoder, decoder, and controllability block for textual and visual inputs. It extracts textual and visual features using a transformer and Faster R-CNN networks and combines them to generate feedback. The CMFeed dataset encompasses images, text, reactions to the post, human comments with relevance scores, and reactions to the comments. The reactions to the post and comments are utilized to train the proposed model to produce feedback with a particular (positive or negative) sentiment. A sentiment classification accuracy of 77.23% has been achieved, 18.82% higher than the accuracy without using the controllability. Moreover, the system incorporates a similarity module for assessing feedback relevance through rank-based metrics. It implements an interpretability technique to analyze the contribution of textual and visual features during the generation of uncontrolled and controlled feedback.
- [1332] arXiv:2402.07680 (cross-list from cs.CV) [ pdf , ps , other ]
-
Title: AYDIV: Adaptable Yielding 3D Object Detection via Integrated Contextual Vision TransformerTanmoy Dam , Sanjay Bhargav Dharavath , Sameer Alam , Nimrod Lilith , Supriyo Chakraborty , Mir FeroskhanComments: This paper has been accepted for ICRA 2024, and copyright will automatically transfer to IEEE upon its availability on the IEEE portalSubjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI); Robotics (cs.RO)
Abstract: Combining LiDAR and camera data has shown potential in enhancing short-distance object detection in autonomous driving systems. Yet, the fusion encounters difficulties with extended distance detection due to the contrast between LiDAR's sparse data and the dense resolution of cameras. Besides, discrepancies in the two data representations further complicate fusion methods. We introduce AYDIV, a novel framework integrating a tri-phase alignment process specifically designed to enhance long-distance detection even amidst data discrepancies. AYDIV consists of the Global Contextual Fusion Alignment Transformer (GCFAT), which improves the extraction of camera features and provides a deeper understanding of large-scale patterns; the Sparse Fused Feature Attention (SFFA), which fine-tunes the fusion of LiDAR and camera details; and the Volumetric Grid Attention (VGA) for a comprehensive spatial data fusion. AYDIV's performance on the Waymo Open Dataset (WOD) with an improvement of 1.24% in mAPH value(L2 difficulty) and the Argoverse2 Dataset with a performance improvement of 7.40% in AP value demonstrates its efficacy in comparison to other existing fusion-based methods. Our code is publicly available at this https URL
- [1333] arXiv:2402.07681 (cross-list from cs.CL) [ pdf , ps , other ]
-
Title: Large Language Models "Ad Referendum": How Good Are They at Machine Translation in the Legal Domain?Subjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI)
Abstract: This study evaluates the machine translation (MT) quality of two state-of-the-art large language models (LLMs) against a tradition-al neural machine translation (NMT) system across four language pairs in the legal domain. It combines automatic evaluation met-rics (AEMs) and human evaluation (HE) by professional transla-tors to assess translation ranking, fluency and adequacy. The re-sults indicate that while Google Translate generally outperforms LLMs in AEMs, human evaluators rate LLMs, especially GPT-4, comparably or slightly better in terms of producing contextually adequate and fluent translations. This discrepancy suggests LLMs' potential in handling specialized legal terminology and context, highlighting the importance of human evaluation methods in assessing MT quality. The study underscores the evolving capabil-ities of LLMs in specialized domains and calls for reevaluation of traditional AEMs to better capture the nuances of LLM-generated translations.
- [1334] arXiv:2402.07689 (cross-list from cs.CL) [ pdf , ps , html , other ]
-
Title: OrderBkd: Textual backdoor attack through repositioningSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI)
Abstract: The use of third-party datasets and pre-trained machine learning models poses a threat to NLP systems due to possibility of hidden backdoor attacks. Existing attacks involve poisoning the data samples such as insertion of tokens or sentence paraphrasing, which either alter the semantics of the original texts or can be detected. Our main difference from the previous work is that we use the reposition of a two words in a sentence as a trigger. By designing and applying specific part-of-speech (POS) based rules for selecting these tokens, we maintain high attack success rate on SST-2 and AG classification datasets while outperforming existing attacks in terms of perplexity and semantic similarity to the clean samples. In addition, we show the robustness of our attack to the ONION defense method. All the code and data for the paper can be obtained at this https URL .
- [1335] arXiv:2402.07703 (cross-list from cs.LG) [ pdf , ps , html , other ]
-
Title: Online Sequential Decision-Making with Unknown DelaysSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Abstract: In the field of online sequential decision-making, we address the problem with delays utilizing the framework of online convex optimization (OCO), where the feedback of a decision can arrive with an unknown delay. Unlike previous research that is limited to Euclidean norm and gradient information, we propose three families of delayed algorithms based on approximate solutions to handle different types of received feedback. Our proposed algorithms are versatile and applicable to universal norms. Specifically, we introduce a family of Follow the Delayed Regularized Leader algorithms for feedback with full information on the loss function, a family of Delayed Mirror Descent algorithms for feedback with gradient information on the loss function and a family of Simplified Delayed Mirror Descent algorithms for feedback with the value information of the loss function's gradients at corresponding decision points. For each type of algorithm, we provide corresponding regret bounds under cases of general convexity and relative strong convexity, respectively. We also demonstrate the efficiency of each algorithm under different norms through concrete examples. Furthermore, our theoretical results are consistent with the current best bounds when degenerated to standard settings.
- [1336] arXiv:2402.07712 (cross-list from cs.LG) [ pdf , ps , html , other ]
-
Title: Model Collapse Demystified: The Case of RegressionSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
Abstract: In the era of proliferation of large language and image generation models, the phenomenon of "model collapse" refers to the situation whereby as a model is trained recursively on data generated from previous generations of itself over time, its performance degrades until the model eventually becomes completely useless, i.e the model collapses. In this work, we study this phenomenon in the setting of high-dimensional regression and obtain analytic formulae which quantitatively outline this phenomenon in a broad range of regimes. In the special case of polynomial decaying spectral and source conditions, we obtain modified scaling laws which exhibit new crossover phenomena from fast to slow rates. We also propose a simple strategy based on adaptive regularization to mitigate model collapse. Our theoretical results are validated with experiments.
- [1337] arXiv:2402.07754 (cross-list from cs.CL) [ pdf , ps , other ]
-
Title: Diffusion of Thoughts: Chain-of-Thought Reasoning in Diffusion Language ModelsJiacheng Ye , Shansan Gong , Liheng Chen , Lin Zheng , Jiahui Gao , Han Shi , Chuan Wu , Zhenguo Li , Wei Bi , Lingpeng KongSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: Diffusion models have gained attention in text processing, offering many potential advantages over traditional autoregressive models. This work explores the integration of diffusion models and Chain-of-Thought (CoT), a well-established technique to improve the reasoning ability in autoregressive language models. We propose Diffusion-of-Thought (DoT), allowing reasoning steps to diffuse over time through the diffusion process. In contrast to traditional autoregressive language models that make decisions in a left-to-right, token-by-token manner, DoT offers more flexibility in the trade-off between computation and reasoning performance. Our experimental results demonstrate the effectiveness of DoT in multi-digit multiplication and grade school math problems. Additionally, DoT showcases promising self-correction abilities and benefits from existing reasoning-enhancing techniques like self-consistency decoding. Our findings contribute to the understanding and development of reasoning capabilities in diffusion language models.
- [1338] arXiv:2402.07757 (cross-list from cs.LG) [ pdf , ps , other ]
-
Title: Towards an Understanding of Stepwise Inference in Transformers: A Synthetic Graph Navigation ModelMikail Khona , Maya Okawa , Jan Hula , Rahul Ramesh , Kento Nishi , Robert Dick , Ekdeep Singh Lubana , Hidenori TanakaSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Abstract: Stepwise inference protocols, such as scratchpads and chain-of-thought, help language models solve complex problems by decomposing them into a sequence of simpler subproblems. Despite the significant gain in performance achieved via these protocols, the underlying mechanisms of stepwise inference have remained elusive. To address this, we propose to study autoregressive Transformer models on a synthetic task that embodies the multi-step nature of problems where stepwise inference is generally most useful. Specifically, we define a graph navigation problem wherein a model is tasked with traversing a path from a start to a goal node on the graph. Despite is simplicity, we find we can empirically reproduce and analyze several phenomena observed at scale: (i) the stepwise inference reasoning gap, the cause of which we find in the structure of the training data; (ii) a diversity-accuracy tradeoff in model generations as sampling temperature varies; (iii) a simplicity bias in the model's output; and (iv) compositional generalization and a primacy bias with in-context exemplars. Overall, our work introduces a grounded, synthetic framework for studying stepwise inference and offers mechanistic hypotheses that can lay the foundation for a deeper understanding of this phenomenon.
- [1339] arXiv:2402.07812 (cross-list from cs.CL) [ pdf , ps , other ]
-
Title: Retrieval-Augmented Thought Process as Sequential Decision MakingComments: 17 pages, 18 figuresSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI); Information Retrieval (cs.IR); Machine Learning (cs.LG)
Abstract: Large Language Models (LLMs) have demonstrated their strong ability to assist people and show "sparks of intelligence". However, several open challenges hinder their wider application: such as concerns over privacy, tendencies to produce hallucinations, and difficulties in handling long contexts. In this work, we address those challenges by introducing the Retrieval-Augmented Thought Process (RATP). Given access to external knowledge, RATP formulates the thought generation of LLMs as a multiple-step decision process. To optimize such a thought process, RATP leverages Monte-Carlo Tree Search, and learns a Q-value estimator that permits cost-efficient inference. In addressing the task of question-answering with private data, where ethical and security concerns limit LLM training methods, RATP achieves a 50% improvement over existing in-context retrieval-augmented language models.
- [1340] arXiv:2402.07814 (cross-list from cs.CV) [ pdf , ps , other ]
-
Title: PBADet: A One-Stage Anchor-Free Approach for Part-Body AssociationZhongpai Gao , Huayi Zhou , Abhishek Sharma , Meng Zheng , Benjamin Planche , Terrence Chen , Ziyan WuComments: Accepted by ICLR2024Subjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI)
Abstract: The detection of human parts (e.g., hands, face) and their correct association with individuals is an essential task, e.g., for ubiquitous human-machine interfaces and action recognition. Traditional methods often employ multi-stage processes, rely on cumbersome anchor-based systems, or do not scale well to larger part sets. This paper presents PBADet, a novel one-stage, anchor-free approach for part-body association detection. Building upon the anchor-free object representation across multi-scale feature maps, we introduce a singular part-to-body center offset that effectively encapsulates the relationship between parts and their parent bodies. Our design is inherently versatile and capable of managing multiple parts-to-body associations without compromising on detection accuracy or robustness. Comprehensive experiments on various datasets underscore the efficacy of our approach, which not only outperforms existing state-of-the-art techniques but also offers a more streamlined and efficient solution to the part-body association challenge.
- [1341] arXiv:2402.07818 (cross-list from cs.LG) [ pdf , ps , html , other ]
-
Title: Differentially Private Zeroth-Order Methods for Scalable Large Language Model FinetuningSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
Abstract: Fine-tuning on task-specific datasets is a widely-embraced paradigm of harnessing the powerful capability of pretrained LLMs for various downstream tasks. Due to the popularity of LLMs fine-tuning and its accompanying privacy concerns, differentially private (DP) fine-tuning of pretrained LLMs has been widely used to safeguarding the privacy of task-specific datasets. Lying at the design core of DP LLM fine-tuning methods is the satisfactory tradeoff among privacy, utility, and scalability. Most existing methods build upon the seminal work of DP-SGD. Despite pushing the scalability of DP-SGD to its limit, DP-SGD-based fine-tuning methods are unfortunately limited by the inherent inefficiency of SGD.
In this paper, we investigate the potential of DP zeroth-order methods for LLM pretraining, which avoids the scalability bottleneck of SGD by approximating the gradient with the more efficient zeroth-order gradient. Rather than treating the zeroth-order method as a drop-in replacement for SGD, this paper presents a comprehensive study both theoretically and empirically. First, we propose the stagewise DP zeroth-order method (DP-ZOSO) that dynamically schedules key hyperparameters. This design is grounded on the synergy between DP random perturbation and the gradient approximation error of the zeroth-order method, and its effect on fine-tuning trajectory.
We provide theoretical analysis for both proposed methods. We conduct extensive empirical analysis on both encoder-only masked language model and decoder-only autoregressive language model, achieving impressive results in terms of scalability and utility (compared with DPZero, DP-ZOPO improves 4.5% on SST-5, 5.5% on MNLI with RoBERTa-Large and 9.2% on CB, 3.9% on BoolQ with OPT-2.7B when $\epsilon=4$). - [1342] arXiv:2402.07845 (cross-list from cs.LG) [ pdf , ps , html , other ]
-
Title: Unsupervised Optimisation of GNNs for Node ClusteringSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Abstract: Graph Neural Networks (GNNs) can be trained to detect communities within a graph by learning from the duality of feature and connectivity information. Currently, the common approach for optimisation of GNNs is to use comparisons to ground-truth for hyperparameter tuning and model selection. In this work, we show that nodes can be clustered into communities with GNNs by solely optimising for modularity, without any comparison to ground-truth. Although modularity is a graph partitioning quality metric, we show that this can be used to optimise GNNs that also encode features without a drop in performance. We take it a step further and also study whether the unsupervised metric performance can predict ground-truth performance. To investigate why modularity can be used to optimise GNNs, we design synthetic experiments that show the limitations of this approach. The synthetic graphs are created to highlight current capabilities in distinct, random and zero information space partitions in attributed graphs. We conclude that modularity can be used for hyperparameter optimisation and model selection on real-world datasets as well as being a suitable proxy for predicting ground-truth performance, however, GNNs fail to balance the information duality when the spaces contain conflicting signals.
- [1343] arXiv:2402.07859 (cross-list from cs.CL) [ pdf , ps , html , other ]
-
Title: Lissard: Long and Simple Sequential Reasoning DatasetsSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI)
Abstract: Language models are now capable of solving tasks that require dealing with long sequences consisting of hundreds of thousands of tokens. However, they often fail on tasks that require repetitive use of simple rules, even on sequences that are much shorter than those seen during training. For example, state-of-the-art LLMs can find common items in two lists with up to 20 items but fail when lists have 80 items. In this paper, we introduce Lissard, a benchmark comprising seven tasks whose goal is to assess the ability of models to process and generate wide-range sequence lengths, requiring repetitive procedural execution. Our evaluation of open-source (Mistral-7B and Mixtral-8x7B) and proprietary models (GPT-3.5 and GPT-4) show a consistent decline in performance across all models as the complexity of the sequence increases. The datasets and code are available at this https URL
- [1344] arXiv:2402.07860 (cross-list from cs.SI) [ pdf , ps , html , other ]
-
Title: On the Detection of Reviewer-Author Collusion Rings From Paper BiddingSubjects: Social and Information Networks (cs.SI) ; Artificial Intelligence (cs.AI); Computer Science and Game Theory (cs.GT)
Abstract: A major threat to the peer-review systems of computer science conferences is the existence of "collusion rings" between reviewers. In such collusion rings, reviewers who have also submitted their own papers to the conference work together to manipulate the conference's paper assignment, with the aim of being assigned to review each other's papers. The most straightforward way that colluding reviewers can manipulate the paper assignment is by indicating their interest in each other's papers through strategic paper bidding. One potential approach to solve this important problem would be to detect the colluding reviewers from their manipulated bids, after which the conference can take appropriate action. While prior work has developed effective techniques to detect other kinds of fraud, no research has yet established that detecting collusion rings is even possible. In this work, we tackle the question of whether it is feasible to detect collusion rings from the paper bidding. To answer this question, we conduct empirical analysis of two realistic conference bidding datasets, including evaluations of existing algorithms for fraud detection in other applications. We find that collusion rings can achieve considerable success at manipulating the paper assignment while remaining hidden from detection: for example, in one dataset, undetected colluders are able to achieve assignment to up to 30% of the papers authored by other colluders. In addition, when 10 colluders bid on all of each other's papers, no detection algorithm outputs a group of reviewers with more than 31% overlap with the true colluders. These results suggest that collusion cannot be effectively detected from the bidding using popular existing tools, demonstrating the need to develop more complex detection algorithms as well as those that leverage additional metadata (e.g., reviewer-paper text-similarity scores).
- [1345] arXiv:2402.07862 (cross-list from cs.CY) [ pdf , ps , other ]
-
Title: AI-Augmented Predictions: LLM Assistants Improve Human Forecasting AccuracyComments: 18 pages (main text comprised of 15 pages, appendix comprised of three pages). 10 visualizations in the main text (four figures, six tables), three additional figures in the appendixSubjects: Computers and Society (cs.CY) ; Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG)
Abstract: Large language models (LLMs) show impressive capabilities, matching and sometimes exceeding human performance in many domains. This study explores the potential of LLMs to augment judgement in forecasting tasks. We evaluated the impact on forecasting accuracy of two GPT-4-Turbo assistants: one designed to provide high-quality advice ('superforecasting'), and the other designed to be overconfident and base-rate-neglecting. Participants (N = 991) had the option to consult their assigned LLM assistant throughout the study, in contrast to a control group that used a less advanced model (DaVinci-003) without direct forecasting support. Our preregistered analyses reveal that LLM augmentation significantly enhances forecasting accuracy by 23% across both types of assistants, compared to the control group. This improvement occurs despite the superforecasting assistant's higher accuracy in predictions, indicating the augmentation's benefit is not solely due to model prediction accuracy. Exploratory analyses showed a pronounced effect in one forecasting item, without which we find that the superforecasting assistant increased accuracy by 43%, compared with 28% for the biased assistant. We further examine whether LLM augmentation disproportionately benefits less skilled forecasters, degrades the wisdom-of-the-crowd by reducing prediction diversity, or varies in effectiveness with question difficulty. Our findings do not consistently support these hypotheses. Our results suggest that access to an LLM assistant, even a biased one, can be a helpful decision aid in cognitively demanding tasks where the answer is not known at the time of interaction.
- [1346] arXiv:2402.07865 (cross-list from cs.CV) [ pdf , ps , other ]
-
Title: Prismatic VLMs: Investigating the Design Space of Visually-Conditioned Language ModelsComments: 22 pages, 11 figures. Training code and models: this https URL . Evaluation code: this https URLSubjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG)
Abstract: Visually-conditioned language models (VLMs) have seen growing adoption in applications such as visual dialogue, scene understanding, and robotic task planning; adoption that has fueled a wealth of new models such as LLaVa, InstructBLIP, and PaLI-3. Despite the volume of new releases, key design decisions around image preprocessing, architecture, and optimization are under-explored, making it challenging to understand what factors account for model performance $-$ a challenge further complicated by the lack of objective, consistent evaluations. To address these gaps, we first compile a suite of standardized evaluations spanning visual question answering, object localization from language, and targeted challenge sets that probe properties such as hallucination; evaluations that provide calibrated, fine-grained insight into a VLM's capabilities. Second, we rigorously investigate VLMs along key design axes, including pretrained visual representations and quantifying the tradeoffs of using base vs. instruct-tuned language models, amongst others. We couple our analysis with three resource contributions: (1) a unified framework for evaluating VLMs, (2) optimized, flexible code for VLM training, and (3) checkpoints for all models, including a family of VLMs at the 7-13B scale that strictly outperform InstructBLIP and LLaVa v1.5, the state-of-the-art in open-source VLMs.
- [1347] arXiv:2402.07871 (cross-list from cs.LG) [ pdf , ps , html , other ]
-
Title: Scaling Laws for Fine-Grained Mixture of ExpertsJakub Krajewski , Jan Ludziejewski , Kamil Adamczewski , Maciej Pióro , Michał Krutul , Szymon Antoniak , Kamil Ciebiera , Krystian Król , Tomasz Odrzygóźdź , Piotr Sankowski , Marek Cygan , Sebastian JaszczurSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
Abstract: Mixture of Experts (MoE) models have emerged as a primary solution for reducing the computational cost of Large Language Models. In this work, we analyze their scaling properties, incorporating an expanded range of variables. Specifically, we introduce a new hyperparameter, granularity, whose adjustment enables precise control over the size of the experts. Building on this, we establish scaling laws for fine-grained MoE, taking into account the number of training tokens, model size, and granularity. Leveraging these laws, we derive the optimal training configuration for a given computational budget. Our findings not only show that MoE models consistently outperform dense Transformers but also highlight that the efficiency gap between dense and MoE models widens as we scale up the model size and training budget. Furthermore, we demonstrate that the common practice of setting the size of experts in MoE to mirror the feed-forward layer is not optimal at almost any computational budget.
- [1348] arXiv:2402.07875 (cross-list from cs.LG) [ pdf , ps , other ]
-
Title: Implicit Bias of Policy Gradient in Linear Quadratic Control: Extrapolation to Unseen Initial StatesSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Systems and Control (eess.SY); Machine Learning (stat.ML)
Abstract: In modern machine learning, models can often fit training data in numerous ways, some of which perform well on unseen (test) data, while others do not. Remarkably, in such cases gradient descent frequently exhibits an implicit bias that leads to excellent performance on unseen data. This implicit bias was extensively studied in supervised learning, but is far less understood in optimal control (reinforcement learning). There, learning a controller applied to a system via gradient descent is known as policy gradient, and a question of prime importance is the extent to which a learned controller extrapolates to unseen initial states. This paper theoretically studies the implicit bias of policy gradient in terms of extrapolation to unseen initial states. Focusing on the fundamental Linear Quadratic Regulator (LQR) problem, we establish that the extent of extrapolation depends on the degree of exploration induced by the system when commencing from initial states included in training. Experiments corroborate our theory, and demonstrate its conclusions on problems beyond LQR, where systems are non-linear and controllers are neural networks. We hypothesize that real-world optimal control may be greatly improved by developing methods for informed selection of initial states to train on.
- [1349] arXiv:2402.07876 (cross-list from cs.LG) [ pdf , ps , html , other ]
-
Title: Policy Improvement using Language Feedback ModelsSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
Abstract: We introduce Language Feedback Models (LFMs) that identify desirable behaviour - actions that help achieve tasks specified in the instruction - for imitation learning in instruction following. To train LFMs, we obtain feedback from Large Language Models (LLMs) on visual trajectories verbalized to language descriptions. First, by using LFMs to identify desirable behaviour to imitate, we improve in task-completion rate over strong behavioural cloning baselines on three distinct language grounding environments (Touchdown, ScienceWorld, and ALFWorld). Second, LFMs outperform using LLMs as experts to directly predict actions, when controlling for the number of LLM output tokens. Third, LFMs generalize to unseen environments, improving task-completion rate by 3.5-12.0% through one round of adaptation. Finally, LFM can be modified to provide human-interpretable feedback without performance loss, allowing human verification of desirable behaviour for imitation learning.
- [1350] arXiv:2402.07895 (cross-list from cs.CV) [ pdf , ps , other ]
-
Title: Detection of Spider Mites on Labrador Beans through Machine Learning Approaches Using Custom DatasetsComments: Australasian Conference on Robotics and Automation (ACRA 2023)Subjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI); Robotics (cs.RO)
Abstract: Amidst growing food production demands, early plant disease detection is essential to safeguard crops; this study proposes a visual machine learning approach for plant disease detection, harnessing RGB and NIR data collected in real-world conditions through a JAI FS-1600D-10GE camera to build an RGBN dataset. A two-stage early plant disease detection model with YOLOv8 and a sequential CNN was used to train on a dataset with partial labels, which showed a 3.6% increase in mAP compared to a single-stage end-to-end segmentation model. The sequential CNN model achieved 90.62% validation accuracy utilising RGBN data. An average of 6.25% validation accuracy increase is found using RGBN in classification compared to RGB using ResNet15 and the sequential CNN models. Further research and dataset improvements are needed to meet food production demands.
- [1351] arXiv:2402.07901 (cross-list from cs.LG) [ pdf , ps , other ]
-
Title: FAST: Factorizable Attention for Speeding up TransformersSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Numerical Analysis (math.NA)
Abstract: Motivated by the factorization inherent in the original fast multipole method and the improved fast Gauss transform we introduce a factorable form of attention that operates efficiently in high dimensions. This approach reduces the computational and memory complexity of the attention mechanism in transformers from $O(N^2)$ to $O(N)$. In comparison to previous attempts, our work presents a linearly scaled attention mechanism that maintains the full representation of the attention matrix without compromising on sparsification and incorporates the all-to-all relationship between tokens. We explore the properties of our new attention metric and conduct tests in various standard settings. Results indicate that our attention mechanism has a robust performance and holds significant promise for diverse applications where self-attention is used.
- [1352] arXiv:2402.07909 (cross-list from cs.HC) [ pdf , ps , html , other ]
-
Title: Prompt4Vis: Prompting Large Language Models with Example Mining and Schema Filtering for Tabular Data VisualizationSubjects: Human-Computer Interaction (cs.HC) ; Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Databases (cs.DB)
Abstract: Data visualization (DV) systems are increasingly recognized for their profound capability to uncover insights from vast datasets, gaining attention across both industry and academia. Crafting data queries is an essential process within certain declarative visualization languages (DVLs, e.g., Vega-Lite, EChart.). The evolution of natural language processing (NLP) technologies has streamlined the use of natural language interfaces to visualize tabular data, offering a more accessible and intuitive user experience. However, current methods for converting natural language questions into data visualization queries, such as Seq2Vis, ncNet, and RGVisNet, despite utilizing complex neural network architectures, still fall short of expectations and have great room for improvement.
Large language models (LLMs) such as ChatGPT and GPT-4, have established new benchmarks in a variety of NLP tasks, fundamentally altering the landscape of the field. Inspired by these advancements, we introduce a novel framework, Prompt4Vis, leveraging LLMs and in-context learning to enhance the performance of generating data visualization from natural language. Prompt4Vis comprises two key components: (1) a multi-objective example mining module, designed to find out the truly effective examples that strengthen the LLM's in-context learning capabilities for text-to-vis; (2) a schema filtering module, which is proposed to simplify the schema of the database. Extensive experiments through 5-fold cross-validation on the NVBench dataset demonstrate the superiority of Prompt4Vis, which notably surpasses the state-of-the-art (SOTA) RGVisNet by approximately 35.9% and 71.3% on dev and test sets, respectively. To the best of our knowledge, Prompt4Vis is the first work that introduces in-context learning into the text-to-vis for generating data visualization queries. - [1353] arXiv:2402.07911 (cross-list from cs.HC) [ pdf , ps , other ]
-
Title: Does mapping elites illuminate search spaces? A large-scale user study of MAP--Elites applied to human--AI collaborative designComments: 17 pagesSubjects: Human-Computer Interaction (cs.HC) ; Artificial Intelligence (cs.AI); Computational Engineering, Finance, and Science (cs.CE); Neural and Evolutionary Computing (cs.NE)
Abstract: Two studies of a human-AI collaborative design tool were carried out in order to understand the influence design recommendations have on the design process. The tool investigated is based on an evolutionary algorithm attempting to design a virtual car to travel as far as possible in a fixed time. Participants were able to design their own cars, make recommendations to the algorithm and view sets of recommendations from the algorithm. The algorithm-recommended sets were designs which had been previously tested; some sets were simply randomly picked and other sets were picked using MAP-Elites. In the first study 808 design sessions were recorded as part of a science outreach program, each with analytical data of how each participant used the tool. To provide context to this quantitative data, a smaller double-blind lab study was also carried out with 12 participants. In the lab study the same quantitative data from the large scale study was collected alongside responses to interview questions. Although there is some evidence that the MAP-Elites provide higher-quality individual recommendations, neither study provides convincing evidence that these recommendations have a more positive influence on the design process than simply a random selection of designs. In fact, it seems that providing a combination of MAP-Elites and randomly selected recommendations is beneficial to the process. Furthermore, simply viewing recommendations from the MAP-Elites had a positive influence on engagement in the design task and the quality of the final design produced. Our findings are significant both for researchers designing new mixed-initiative tools, and those who wish to evaluate existing tools. Most significantly, we found that metrics researchers currently use to evaluate the success of human-AI collaborative algorithms do not measure the full influence these algorithms have on the design process.
- [1354] arXiv:2402.07912 (cross-list from cs.HC) [ pdf , ps , html , other ]
-
Title: Spatial Computing: Concept, Applications, Challenges and Future DirectionsGokul Yenduri , Ramalingam M , Praveen Kumar Reddy Maddikunta , Thippa Reddy Gadekallu , Rutvij H Jhaveri , Ajay Bandi , Junxin Chen , Wei Wang , Adarsh Arunkumar Shirawalmath , Raghav Ravishankar , Weizheng WangComments: Submitted to peer revieweSubjects: Human-Computer Interaction (cs.HC) ; Artificial Intelligence (cs.AI)
Abstract: Spatial computing is a technological advancement that facilitates the seamless integration of devices into the physical environment, resulting in a more natural and intuitive digital world user experience. Spatial computing has the potential to become a significant advancement in the field of computing. From GPS and location-based services to healthcare, spatial computing technologies have influenced and improved our interactions with the digital world. The use of spatial computing in creating interactive digital environments has become increasingly popular and effective. This is explained by its increasing significance among researchers and industrial organisations, which motivated us to conduct this review. This review provides a detailed overview of spatial computing, including its enabling technologies and its impact on various applications. Projects related to spatial computing are also discussed. In this review, we also explored the potential challenges and limitations of spatial computing. Furthermore, we discuss potential solutions and future directions. Overall, this paper aims to provide a comprehensive understanding of spatial computing, its enabling technologies, their impact on various applications, emerging challenges, and potential solutions.
- [1355] arXiv:2402.07913 (cross-list from cs.CL) [ pdf , ps , html , other ]
-
Title: QACP: An Annotated Question Answering Dataset for Assisting Chinese Python Programming LearnersSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI); Human-Computer Interaction (cs.HC)
Abstract: In online learning platforms, particularly in rapidly growing computer programming courses, addressing the thousands of students' learning queries requires considerable human cost. The creation of intelligent assistant large language models (LLMs) tailored for programming education necessitates distinct data support. However, in real application scenarios, the data resources for training such LLMs are relatively scarce. Therefore, to address the data scarcity in intelligent educational systems for programming, this paper proposes a new Chinese question-and-answer dataset for Python learners. To ensure the authenticity and reliability of the sources of the questions, we collected questions from actual student questions and categorized them according to various dimensions such as the type of questions and the type of learners. This annotation principle is designed to enhance the effectiveness and quality of online programming education, providing a solid data foundation for developing the programming teaching assists (TA). Furthermore, we conducted comprehensive evaluations of various LLMs proficient in processing and generating Chinese content, highlighting the potential limitations of general LLMs as intelligent teaching assistants in computer programming courses.
- [1356] arXiv:2402.07919 (cross-list from cs.HC) [ pdf , ps , other ]
-
Title: How Can Generative AI Enhance the Well-being of Blind?Comments: 7 pagesSubjects: Human-Computer Interaction (cs.HC) ; Artificial Intelligence (cs.AI)
Abstract: This paper examines the question of how generative AI can improve the well-being of blind or visually impaired people. It refers to a current example, the Be My Eyes app, in which the Be My AI feature was integrated in 2023, which is based on GPT-4 from OpenAI. The author's tests are described and evaluated. There is also an ethical and social discussion. The power of the tool, which can analyze still images in an amazing way, is demonstrated. Those affected gain a new independence and a new perception of their environment. At the same time, they are dependent on the world view and morality of the provider or developer, who prescribe or deny them certain descriptions. An outlook makes it clear that the analysis of moving images will mean a further leap forward. It is fair to say that generative AI can fundamentally improve the well-being of blind and visually impaired people and will change it in various ways.
- [1357] arXiv:2402.07922 (cross-list from cs.HC) [ pdf , ps , html , other ]
-
Title: Towards the Human Digital Twin: Definition and Design -- A surveyComments: This paper is an extension of the following paper: Lauer-Schmaltz MW, Cash P, Hansen JP, Maier A. Designing Human Digital Twins for Behaviour-Changing Therapy and Rehabilitation: A Systematic Review. Proceedings of the Design Society. 2022;2:1303-1312. doi: https://doi.org/10.1017/pds.2022.132Subjects: Human-Computer Interaction (cs.HC) ; Artificial Intelligence (cs.AI); Databases (cs.DB)
Abstract: Human Digital Twins (HDTs) are a fast-emerging technology with significant potential in fields ranging from healthcare to sports. HDTs extend the traditional understanding of Digital Twins by representing humans as the underlying physical entity. This has introduced several significant challenges, including ambiguity in the definition of HDTs and a lack of guidance for their design. This survey brings together the recent advances in the field of HDTs to guide future developers by proposing a first cross-domain definition of HDTs based on their characteristics, as well as eleven key design considerations that emerge from the associated challenges.
- [1358] arXiv:2402.07928 (cross-list from cs.HC) [ pdf , ps , html , other ]
-
Title: Abstracted Trajectory Visualization for Explainability in Reinforcement LearningComments: 14pages, 11figuresSubjects: Human-Computer Interaction (cs.HC) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: Explainable AI (XAI) has demonstrated the potential to help reinforcement learning (RL) practitioners to understand how RL models work. However, XAI for users who do not have RL expertise (non-RL experts), has not been studied sufficiently. This results in a difficulty for the non-RL experts to participate in the fundamental discussion of how RL models should be designed for an incoming society where humans and AI coexist. Solving such a problem would enable RL experts to communicate with the non-RL experts in producing machine learning solutions that better fit our society. We argue that abstracted trajectories, that depicts transitions between the major states of the RL model, will be useful for non-RL experts to build a mental model of the agents. Our early results suggest that by leveraging a visualization of the abstracted trajectories, users without RL expertise are able to infer the behavior patterns of RL.
- [1359] arXiv:2402.07932 (cross-list from cs.HC) [ pdf , ps , html , other ]
-
Title: A Human-Machine Collaboration Framework for the Development of SchemasComments: 14 pagesSubjects: Human-Computer Interaction (cs.HC) ; Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
Abstract: The Winograd Schema Challenge (WSC), a seemingly well-thought-out test for machine intelligence, has been proposed to shed light on developing systems that exhibit human behavior. Since its introduction, it aimed to pivot the focus of the AI community from the technology to the science of AI. While common and trivial for humans, studies show that it is still challenging for machines, especially when they have to deal with novel schemas, that is, well-designed sentences that require the resolving of definite pronouns. As researchers have become increasingly interested in the challenge itself, this presumably necessitates the availability of an extensive collection of Winograd schemas, which goes beyond what human experts can reasonably develop themselves, especially after proposed ways of utilizing them as novel forms of CAPTCHAs.
To address this necessity, we propose a novel framework that explicitly focuses on how humans and machines can collaborate as teammates to design novel schemas from scratch. This is being accomplished by combining two recent studies from the literature: i) Winventor, a machine-driven approach for the development of large amounts of Winograd schemas, albeit not of high quality, and ii) WinoFlexi, an online crowdsourcing system that allows crowd workers to develop a limited number of schemas often of similar quality to that of experts. Our proposal crafts a new road map toward developing a novel collaborative platform that amplifies human and machine intelligence by combining their complementary strengths. - [1360] arXiv:2402.07933 (cross-list from cs.HC) [ pdf , ps , other ]
-
Title: Human-Centered AI Product Prototyping with No-Code AutoML: Conceptual Framework, Potentials and LimitationsSubjects: Human-Computer Interaction (cs.HC) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Software Engineering (cs.SE)
Abstract: This paper evaluates No-Code AutoML as a solution for challenges in AI product prototyping, characterized by unpredictability and inaccessibility to non-experts, and proposes a conceptual framework. This complexity of AI products hinders seamless execution and interdisciplinary collaboration crucial for human-centered AI products. Relevant to industry and innovation, it affects strategic decision-making and investment risk mitigation. Current approaches provide limited insights into the potential and feasibility of AI product ideas. Employing Design Science Research, the study identifies challenges and integrates no-code AutoML as a solution by presenting a framework for AI product prototyping with No-code AutoML. A case study confirms its potential in supporting non-experts, offering a structured approach to AI product development. The framework facilitates accessible and interpretable prototyping, benefiting academia, managers, and decision-makers. Strategic integration of no-code AutoML enhances efficiency, empowers non-experts, and informs early-stage decisions, albeit with acknowledged limitations.
- [1361] arXiv:2402.07938 (cross-list from cs.HC) [ pdf , ps , html , other ]
-
Title: Large Language User Interfaces: Voice Interactive User Interfaces powered by LLMsComments: Accepted as peer-reviewed publicationSubjects: Human-Computer Interaction (cs.HC) ; Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG)
Abstract: The evolution of Large Language Models (LLMs) has showcased remarkable capacities for logical reasoning and natural language comprehension. These capabilities can be leveraged in solutions that semantically and textually model complex problems. In this paper, we present our efforts toward constructing a framework that can serve as an intermediary between a user and their user interface (UI), enabling dynamic and real-time interactions. We employ a system that stands upon textual semantic mappings of UI components, in the form of annotations. These mappings are stored, parsed, and scaled in a custom data structure, supplementary to an agent-based prompting backend engine. Employing textual semantic mappings allows each component to not only explain its role to the engine but also provide expectations. By comprehending the needs of both the user and the components, our LLM engine can classify the most appropriate application, extract relevant parameters, and subsequently execute precise predictions of the user's expected actions. Such an integration evolves static user interfaces into highly dynamic and adaptable solutions, introducing a new frontier of intelligent and responsive user experiences.
- [1362] arXiv:2402.07939 (cross-list from cs.HC) [ pdf , ps , html , other ]
-
Title: UFO: A UI-Focused Agent for Windows OS InteractionChaoyun Zhang , Liqun Li , Shilin He , Xu Zhang , Bo Qiao , Si Qin , Minghua Ma , Yu Kang , Qingwei Lin , Saravan Rajmohan , Dongmei Zhang , Qi ZhangSubjects: Human-Computer Interaction (cs.HC) ; Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
Abstract: We introduce UFO, an innovative UI-Focused agent to fulfill user requests tailored to applications on Windows OS, harnessing the capabilities of GPT-Vision. UFO employs a dual-agent framework to meticulously observe and analyze the graphical user interface (GUI) and control information of Windows applications. This enables the agent to seamlessly navigate and operate within individual applications and across them to fulfill user requests, even when spanning multiple applications. The framework incorporates a control interaction module, facilitating action grounding without human intervention and enabling fully automated execution. Consequently, UFO transforms arduous and time-consuming processes into simple tasks achievable solely through natural language commands. We conducted testing of UFO across 9 popular Windows applications, encompassing a variety of scenarios reflective of users' daily usage. The results, derived from both quantitative metrics and real-case studies, underscore the superior effectiveness of UFO in fulfilling user requests. To the best of our knowledge, UFO stands as the first UI agent specifically tailored for task completion within the Windows OS environment. The open-source code for UFO is available on this https URL .
- [1363] arXiv:2402.07940 (cross-list from cs.HC) [ pdf , ps , html , other ]
-
Title: LLMs Among Us: Generative AI Participating in Digital DiscourseSubjects: Human-Computer Interaction (cs.HC) ; Artificial Intelligence (cs.AI); Computers and Society (cs.CY); Social and Information Networks (cs.SI)
Abstract: The emergence of Large Language Models (LLMs) has great potential to reshape the landscape of many social media platforms. While this can bring promising opportunities, it also raises many threats, such as biases and privacy concerns, and may contribute to the spread of propaganda by malicious actors. We developed the "LLMs Among Us" experimental framework on top of the Mastodon social media platform for bot and human participants to communicate without knowing the ratio or nature of bot and human participants. We built 10 personas with three different LLMs, GPT-4, LLama 2 Chat, and Claude. We conducted three rounds of the experiment and surveyed participants after each round to measure the ability of LLMs to pose as human participants without human detection. We found that participants correctly identified the nature of other users in the experiment only 42% of the time despite knowing the presence of both bots and humans. We also found that the choice of persona had substantially more impact on human perception than the choice of mainstream LLMs.
- [1364] arXiv:2402.07945 (cross-list from cs.HC) [ pdf , ps , other ]
-
Title: ScreenAgent: A Vision Language Model-driven Computer Control AgentRunliang Niu , Jindong Li , Shiqi Wang , Yali Fu , Xiyu Hu , Xueyuan Leng , He Kong , Yi Chang , Qi WangSubjects: Human-Computer Interaction (cs.HC) ; Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
Abstract: Existing Large Language Models (LLM) can invoke a variety of tools and APIs to complete complex tasks. The computer, as the most powerful and universal tool, could potentially be controlled directly by a trained LLM agent. Powered by the computer, we can hopefully build a more generalized agent to assist humans in various daily digital works. In this paper, we construct an environment for a Vision Language Model (VLM) agent to interact with a real computer screen. Within this environment, the agent can observe screenshots and manipulate the Graphics User Interface (GUI) by outputting mouse and keyboard actions. We also design an automated control pipeline that includes planning, acting, and reflecting phases, guiding the agent to continuously interact with the environment and complete multi-step tasks. Additionally, we construct the ScreenAgent Dataset, which collects screenshots and action sequences when completing a variety of daily computer tasks. Finally, we trained a model, ScreenAgent, which achieved computer control capabilities comparable to GPT-4V and demonstrated more precise UI positioning capabilities. Our attempts could inspire further research on building a generalist LLM agent. The code is available at \url{ this https URL }.
- [1365] arXiv:2402.07946 (cross-list from cs.HC) [ pdf , ps , html , other ]
-
Title: Re-Envisioning Command and ControlComments: Accepted at the NATO Science and Technology Organization Symposium (ICMCIS) organized by the Information Systems Technology (IST) Panel, IST-205-RSY - the ICMCIS, held in Koblenz, Germany, 23-24 April 2024Subjects: Human-Computer Interaction (cs.HC) ; Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG)
Abstract: Future warfare will require Command and Control (C2) decision-making to occur in more complex, fast-paced, ill-structured, and demanding conditions. C2 will be further complicated by operational challenges such as Denied, Degraded, Intermittent, and Limited (DDIL) communications and the need to account for many data streams, potentially across multiple domains of operation. Yet, current C2 practices -- which stem from the industrial era rather than the emerging intelligence era -- are linear and time-consuming. Critically, these approaches may fail to maintain overmatch against adversaries on the future battlefield. To address these challenges, we propose a vision for future C2 based on robust partnerships between humans and artificial intelligence (AI) systems. This future vision is encapsulated in three operational impacts: streamlining the C2 operations process, maintaining unity of effort, and developing adaptive collective knowledge systems. This paper illustrates the envisaged future C2 capabilities, discusses the assumptions that shaped them, and describes how the proposed developments could transform C2 in future warfare.
- [1366] arXiv:2402.07949 (cross-list from q-bio.QM) [ pdf , ps , html , other ]
-
Title: Optimizing the Design of an Artificial Pancreas to Improve Diabetes ManagementSubjects: Quantitative Methods (q-bio.QM) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)
Abstract: Diabetes, a chronic condition that impairs how the body turns food into energy, i.e. blood glucose, affects 38 million people in the US alone. The standard treatment is to supplement carbohydrate intake with an artificial pancreas, i.e. a continuous insulin pump (basal shots), as well as occasional insulin injections (bolus shots). The goal of the treatment is to keep blood glucose at the center of an acceptable range, as measured through a continuous glucose meter. A secondary goal is to minimize injections, which are unpleasant and difficult for some patients to implement. In this study, neuroevolution was used to discover an optimal strategy for the treatment. Based on a dataset of 30 days of treatment and measurements of a single patient, a random forest was first trained to predict future glucose levels. A neural network was then evolved to prescribe carbohydrates, basal pumping levels, and bolus injections. Evolution discovered a Pareto front that reduced deviation from the target and number of injections compared to the original data, thus improving patients' quality of life. To make the system easier to adopt, a language interface was developed with a large language model. Thus, these technologies not only improve patient care but also adoption in a broader population.
- [1367] arXiv:2402.07956 (cross-list from cs.HC) [ pdf , ps , other ]
-
Title: Educational data mining and learning analytics: An updated surveyJournal-ref: Wiley interdisciplinary reviews: Data mining and knowledge discovery;2020; 10(3):e1355Subjects: Human-Computer Interaction (cs.HC) ; Artificial Intelligence (cs.AI)
Abstract: This survey is an updated and improved version of the previous one published in 2013 in this journal with the title data mining in education. It reviews in a comprehensible and very general way how Educational Data Mining and Learning Analytics have been applied over educational data. In the last decade, this research area has evolved enormously and a wide range of related terms are now used in the bibliography such as Academic Analytics, Institutional Analytics, Teaching Analytics, Data-Driven Education, Data-Driven Decision-Making in Education, Big Data in Education, and Educational Data Science. This paper provides the current state of the art by reviewing the main publications, the key milestones, the knowledge discovery cycle, the main educational environments, the specific tools, the free available datasets, the most used methods, the main objectives, and the future trends in this research area.
- [1368] arXiv:2402.08010 (cross-list from cs.LG) [ pdf , ps , html , other ]
-
Title: Which Frequencies do CNNs Need? Emergent Bottleneck Structure in Feature LearningSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
Abstract: We describe the emergence of a Convolution Bottleneck (CBN) structure in CNNs, where the network uses its first few layers to transform the input representation into a representation that is supported only along a few frequencies and channels, before using the last few layers to map back to the outputs. We define the CBN rank, which describes the number and type of frequencies that are kept inside the bottleneck, and partially prove that the parameter norm required to represent a function $f$ scales as depth times the CBN rank $f$. We also show that the parameter norm depends at next order on the regularity of $f$. We show that any network with almost optimal parameter norm will exhibit a CBN structure in both the weights and - under the assumption that the network is stable under large learning rate - the activations, which motivates the common practice of down-sampling; and we verify that the CBN results still hold with down-sampling. Finally we use the CBN structure to interpret the functions learned by CNNs on a number of tasks.
- [1369] arXiv:2402.08023 (cross-list from cs.LG) [ pdf , ps , html , other ]
-
Title: UGMAE: A Unified Framework for Graph Masked AutoencodersSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Abstract: Generative self-supervised learning on graphs, particularly graph masked autoencoders, has emerged as a popular learning paradigm and demonstrated its efficacy in handling non-Euclidean data. However, several remaining issues limit the capability of existing methods: 1) the disregard of uneven node significance in masking, 2) the underutilization of holistic graph information, 3) the ignorance of semantic knowledge in the representation space due to the exclusive use of reconstruction loss in the output space, and 4) the unstable reconstructions caused by the large volume of masked contents. In light of this, we propose UGMAE, a unified framework for graph masked autoencoders to address these issues from the perspectives of adaptivity, integrity, complementarity, and consistency. Specifically, we first develop an adaptive feature mask generator to account for the unique significance of nodes and sample informative masks (adaptivity). We then design a ranking-based structure reconstruction objective joint with feature reconstruction to capture holistic graph information and emphasize the topological proximity between neighbors (integrity). After that, we present a bootstrapping-based similarity module to encode the high-level semantic knowledge in the representation space, complementary to the low-level reconstruction in the output space (complementarity). Finally, we build a consistency assurance module to provide reconstruction objectives with extra stabilized consistency targets (consistency). Extensive experiments demonstrate that UGMAE outperforms both contrastive and generative state-of-the-art baselines on several tasks across multiple datasets.
- [1370] arXiv:2402.08030 (cross-list from cs.HC) [ pdf , ps , html , other ]
-
Title: Why and When LLM-Based Assistants Can Go Wrong: Investigating the Effectiveness of Prompt-Based Interactions for Software Help-SeekingComments: Accepted for publication in the Proceedings of the 29th International Conference on Intelligent User Interfaces (IUI'24), March 18--21, 2024, in Greenville, SC, USASubjects: Human-Computer Interaction (cs.HC) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: Large Language Model (LLM) assistants, such as ChatGPT, have emerged as potential alternatives to search methods for helping users navigate complex, feature-rich software. LLMs use vast training data from domain-specific texts, software manuals, and code repositories to mimic human-like interactions, offering tailored assistance, including step-by-step instructions. In this work, we investigated LLM-generated software guidance through a within-subject experiment with 16 participants and follow-up interviews. We compared a baseline LLM assistant with an LLM optimized for particular software contexts, SoftAIBot, which also offered guidelines for constructing appropriate prompts. We assessed task completion, perceived accuracy, relevance, and trust. Surprisingly, although SoftAIBot outperformed the baseline LLM, our results revealed no significant difference in LLM usage and user perceptions with or without prompt guidelines and the integration of domain context. Most users struggled to understand how the prompt's text related to the LLM's responses and often followed the LLM's suggestions verbatim, even if they were incorrect. This resulted in difficulties when using the LLM's advice for software tasks, leading to low task completion rates. Our detailed analysis also revealed that users remained unaware of inaccuracies in the LLM's responses, indicating a gap between their lack of software expertise and their ability to evaluate the LLM's assistance. With the growing push for designing domain-specific LLM assistants, we emphasize the importance of incorporating explainable, context-aware cues into LLMs to help users understand prompt-based interactions, identify biases, and maximize the utility of LLM assistants.
- [1371] arXiv:2402.08062 (cross-list from cs.LG) [ pdf , ps , html , other ]
-
Title: Avoiding Catastrophe in Continuous Spaces by Asking for HelpSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Abstract: Most reinforcement learning algorithms with formal regret guarantees assume all mistakes are reversible and rely on essentially trying all possible options. This approach leads to poor outcomes when some mistakes are irreparable or even catastrophic. We propose a variant of the contextual bandit problem where the goal is to minimize the chance of catastrophe. Specifically, we assume that the payoff each round represents the chance of avoiding catastrophe that round, and try to maximize the product of payoffs (the overall chance of avoiding catastrophe). To give the agent some chance of success, we allow a limited number of queries to a mentor and assume a Lipschitz continuous payoff function. We present an algorithm whose regret and rate of querying the mentor both approach 0 as the time horizon grows, assuming a continuous 1D state space and a relatively "simple" payoff function. We also provide a matching lower bound: without the simplicity assumption: any algorithm either constantly asks for help or is nearly guaranteed to cause catastrophe. Finally, we identify the key obstacle to generalizing our algorithm to a multi-dimensional state space.
- [1372] arXiv:2402.08072 (cross-list from cs.HC) [ pdf , ps , html , other ]
-
Title: Enhancing Programming Error Messages in Real Time with Generative AIBailey Kimmel , Austin Geisert , Lily Yaro , Brendan Gipson , Taylor Hotchkiss , Sidney Osae-Asante , Hunter Vaught , Grant Wininger , Chase YamaguchiComments: Accepted to CHI 2024Subjects: Human-Computer Interaction (cs.HC) ; Artificial Intelligence (cs.AI)
Abstract: Generative AI is changing the way that many disciplines are taught, including computer science. Researchers have shown that generative AI tools are capable of solving programming problems, writing extensive blocks of code, and explaining complex code in simple terms. Particular promise has been shown in using generative AI to enhance programming error messages. Both students and instructors have complained for decades that these messages are often cryptic and difficult to understand. Yet recent work has shown that students make fewer repeated errors when enhanced via GPT-4. We extend this work by implementing feedback from ChatGPT for all programs submitted to our automated assessment tool, Athene, providing help for compiler, run-time, and logic errors. Our results indicate that adding generative AI to an automated assessment tool does not necessarily make it better and that design of the interface matters greatly to the usability of the feedback that GPT-4 provided.
- [1373] arXiv:2402.08075 (cross-list from q-bio.GN) [ pdf , ps , html , other ]
-
Title: Efficient and Scalable Fine-Tune of Language Models for Genome UnderstandingSubjects: Genomics (q-bio.GN) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: Although DNA foundation models have advanced the understanding of genomes, they still face significant challenges in the limited scale and diversity of genomic data. This limitation starkly contrasts with the success of natural language foundation models, which thrive on substantially larger scales. Furthermore, genome understanding involves numerous downstream genome annotation tasks with inherent data heterogeneity, thereby necessitating more efficient and robust fine-tuning methods tailored for genomics. Here, we present \textsc{Lingo}: \textsc{L}anguage prefix f\textsc{In}e-tuning for \textsc{G}en\textsc{O}mes. Unlike DNA foundation models, \textsc{Lingo} strategically leverages natural language foundation models' contextual cues, recalibrating their linguistic knowledge to genomic sequences. \textsc{Lingo} further accommodates numerous, heterogeneous downstream fine-tune tasks by an adaptive rank sampling method that prunes and stochastically reintroduces pruned singular vectors within small computational budgets. Adaptive rank sampling outperformed existing fine-tuning methods on all benchmarked 14 genome understanding tasks, while requiring fewer than 2\% of trainable parameters as genomic-specific adapters. Impressively, applying these adapters on natural language foundation models matched or even exceeded the performance of DNA foundation models. \textsc{Lingo} presents a new paradigm of efficient and scalable genome understanding via genomic-specific adapters on language models.
- [1374] arXiv:2402.08085 (cross-list from cs.LG) [ pdf , ps , html , other ]
-
Title: Message Detouring: A Simple Yet Effective Cycle Representation for Expressive Graph LearningComments: 16 pages, 5 figuresSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Computational Geometry (cs.CG)
Abstract: Graph learning is crucial in the fields of bioinformatics, social networks, and chemicals. Although high-order graphlets, such as cycles, are critical to achieving an informative graph representation for node classification, edge prediction, and graph recognition, modeling high-order topological characteristics poses significant computational challenges, restricting its widespread applications in machine learning. To address this limitation, we introduce the concept of \textit{message detouring} to hierarchically characterize cycle representation throughout the entire graph, which capitalizes on the contrast between the shortest and longest pathways within a range of local topologies associated with each graph node. The topological feature representations derived from our message detouring landscape demonstrate comparable expressive power to high-order \textit{Weisfeiler-Lehman} (WL) tests but much less computational demands. In addition to the integration with graph kernel and message passing neural networks, we present a novel message detouring neural network, which uses Transformer backbone to integrate cycle representations across nodes and edges. Aside from theoretical results, experimental results on expressiveness, graph classification, and node classification show message detouring can significantly outperform current counterpart approaches on various benchmark datasets.
- [1375] arXiv:2402.08112 (cross-list from cs.LG) [ pdf , ps , html , other ]
-
Title: A Competition Winning Deep Reinforcement Learning Agent in microRTSComments: 26 pages, 4 figuresSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Abstract: Scripted agents have predominantly won the five previous iterations of the IEEE microRTS ($\mu$RTS) competitions hosted at CIG and CoG. Despite Deep Reinforcement Learning (DRL) algorithms making significant strides in real-time strategy (RTS) games, their adoption in this primarily academic competition has been limited due to the considerable training resources required and the complexity inherent in creating and debugging such agents. RAISocketAI is the first DRL agent to win the IEEE microRTS competition. In a benchmark without performance constraints, RAISocketAI regularly defeated the two prior competition winners. This first competition-winning DRL submission can be a benchmark for future microRTS competitions and a starting point for future DRL research. Iteratively fine-tuning the base policy and transfer learning to specific maps were critical to RAISocketAI's winning performance. These strategies can be used to economically train future DRL agents. Further work in Imitation Learning using Behavior Cloning and fine-tuning these models with DRL has proven promising as an efficient way to bootstrap models with demonstrated, competitive behaviors.
- [1376] arXiv:2402.08114 (cross-list from cs.LG) [ pdf , ps , html , other ]
-
Title: Active Preference Learning for Large Language ModelsComments: 13 pages, 5 figures, 6 tablesSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
Abstract: As large language models (LLMs) become more capable, fine-tuning techniques for aligning with human intent are increasingly important. A key consideration for aligning these models is how to most effectively use human resources, or model resources in the case where LLMs themselves are used as oracles. Reinforcement learning from Human or AI preferences (RLHF/RLAIF) is the most prominent example of such a technique, but is complex and often unstable. Direct Preference Optimization (DPO) has recently been proposed as a simpler and more stable alternative. In this work, we develop an active learning strategy for DPO to make better use of preference labels. We propose a practical acquisition function for prompt/completion pairs based on the predictive entropy of the language model and a measure of certainty of the implicit preference model optimized by DPO. We demonstrate how our approach improves both the rate of learning and final performance of fine-tuning on pairwise preference data.
- [1377] arXiv:2402.08125 (cross-list from cs.RO) [ pdf , ps , html , other ]
-
Title: Customizable Perturbation Synthesis for Robust SLAM BenchmarkingXiaohao Xu , Tianyi Zhang , Sibo Wang , Xiang Li , Yongqi Chen , Ye Li , Bhiksha Raj , Matthew Johnson-Roberson , Xiaonan HuangComments: 40 pagesSubjects: Robotics (cs.RO) ; Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM)
Abstract: Robustness is a crucial factor for the successful deployment of robots in unstructured environments, particularly in the domain of Simultaneous Localization and Mapping (SLAM). Simulation-based benchmarks have emerged as a highly scalable approach for robustness evaluation compared to real-world data collection. However, crafting a challenging and controllable noisy world with diverse perturbations remains relatively under-explored. To this end, we propose a novel, customizable pipeline for noisy data synthesis, aimed at assessing the resilience of multi-modal SLAM models against various perturbations. This pipeline incorporates customizable hardware setups, software components, and perturbed environments. In particular, we introduce comprehensive perturbation taxonomy along with a perturbation composition toolbox, allowing the transformation of clean simulations into challenging noisy environments. Utilizing the pipeline, we instantiate the Robust-SLAM benchmark, which includes diverse perturbation types, to evaluate the risk tolerance of existing advanced multi-modal SLAM models. Our extensive analysis uncovers the susceptibilities of existing SLAM models to real-world disturbance, despite their demonstrated accuracy in standard benchmarks. Our perturbation synthesis toolbox, SLAM robustness evaluation pipeline, and Robust-SLAM benchmark will be made publicly available at this https URL .
- [1378] arXiv:2402.08143 (cross-list from econ.GN) [ pdf , ps , other ]
-
Title: Artificial intelligence and the transformation of higher education institutionsSubjects: General Economics (econ.GN) ; Artificial Intelligence (cs.AI); Computers and Society (cs.CY)
Abstract: Artificial intelligence (AI) advances and the rapid adoption of generative AI tools like ChatGPT present new opportunities and challenges for higher education. While substantial literature discusses AI in higher education, there is a lack of a systemic approach that captures a holistic view of the AI transformation of higher education institutions (HEIs). To fill this gap, this article, taking a complex systems approach, develops a causal loop diagram (CLD) to map the causal feedback mechanisms of AI transformation in a typical HEI. Our model accounts for the forces that drive the AI transformation and the consequences of the AI transformation on value creation in a typical HEI. The article identifies and analyzes several reinforcing and balancing feedback loops, showing how, motivated by AI technology advances, the HEI invests in AI to improve student learning, research, and administration. The HEI must take measures to deal with academic integrity problems and adapt to changes in available jobs due to AI, emphasizing AI-complementary skills for its students. However, HEIs face a competitive threat and several policy traps that may lead to decline. HEI leaders need to become systems thinkers to manage the complexity of the AI transformation and benefit from the AI feedback loops while avoiding the associated pitfalls. We also discuss long-term scenarios, the notion of HEIs influencing the direction of AI, and directions for future research on AI transformation.
- [1379] arXiv:2402.08144 (cross-list from cs.GT) [ pdf , ps , other ]
-
Title: Average-Case Analysis of Iterative VotingComments: 75 pages, 2 figuresSubjects: Computer Science and Game Theory (cs.GT) ; Artificial Intelligence (cs.AI)
Abstract: Iterative voting is a natural model of repeated strategic decision-making in social choice when agents have the opportunity to update their votes prior to finalizing the group decision. Prior work has analyzed the efficacy of iterative plurality on the welfare of the chosen outcome at equilibrium, relative to the truthful vote profile, via an adaptation of the price of anarchy. However, prior analyses have only studied the worst- and average-case performances when agents' preferences are distributed by the impartial culture. This work extends average-case analysis to a wider class of distributions and distinguishes when iterative plurality improves or degrades asymptotic welfare.
- [1380] arXiv:2402.08147 (cross-list from cs.SE) [ pdf , ps , other ]
-
Title: Verified Multi-Step Synthesis using Large Language Models and Monte Carlo Tree SearchDavid Brandfonbrener , Sibi Raja , Tarun Prasad , Chloe Loughridge , Jianang Yang , Simon Henniger , William E. Byrd , Robert Zinkov , Nada AminSubjects: Software Engineering (cs.SE) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Logic in Computer Science (cs.LO); Programming Languages (cs.PL)
Abstract: We present an approach using Monte Carlo Tree Search (MCTS) to guide Large Language Models (LLMs) to generate verified programs in Dafny, Lean and Coq. Our method, which we call VMCTS, leverages the verifier inside the search algorithm by checking partial programs at each step. In combination with the LLM prior, the verifier feedback raises the synthesis capabilities of open source models. On a set of five verified programming problems, we find that in four problems where the base model cannot solve the question even when re-sampling solutions for one hour, VMCTS can solve the problems within 6 minutes. The base model with VMCTS is even competitive with ChatGPT4 augmented with plugins and multiple re-tries on these problems. Our code and benchmarks are available at this https URL .
- [1381] arXiv:2402.08151 (cross-list from stat.ME) [ pdf , ps , html , other ]
-
Title: Gradient-flow adaptive importance sampling for Bayesian leave one out cross-validation for sigmoidal classification modelsComments: SubmittedSubjects: Methodology (stat.ME) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Spectral Theory (math.SP); Statistics Theory (math.ST)
Abstract: We introduce a set of gradient-flow-guided adaptive importance sampling (IS) transformations to stabilize Monte-Carlo approximations of point-wise leave one out cross-validated (LOO) predictions for Bayesian classification models. One can leverage this methodology for assessing model generalizability by for instance computing a LOO analogue to the AIC or computing LOO ROC/PRC curves and derived metrics like the AUROC and AUPRC. By the calculus of variations and gradient flow, we derive two simple nonlinear single-step transformations that utilize gradient information to shift a model's pre-trained full-data posterior closer to the target LOO posterior predictive distributions. In doing so, the transformations stabilize importance weights. Because the transformations involve the gradient of the likelihood function, the resulting Monte Carlo integral depends on Jacobian determinants with respect to the model Hessian. We derive closed-form exact formulae for these Jacobian determinants in the cases of logistic regression and shallow ReLU-activated artificial neural networks, and provide a simple approximation that sidesteps the need to compute full Hessian matrices and their spectra. We test the methodology on an $n\ll p$ dataset that is known to produce unstable LOO IS weights.
- [1382] arXiv:2402.08155 (cross-list from cs.CL) [ pdf , ps , html , other ]
-
Title: CMA-R:Causal Mediation Analysis for Explaining Rumour DetectionComments: 9 pages, 7 figures, Accepted by EACL 2024 FindingsSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI)
Abstract: We apply causal mediation analysis to explain the decision-making process of neural models for rumour detection on Twitter. Interventions at the input and network level reveal the causal impacts of tweets and words in the model output. We find that our approach CMA-R -- Causal Mediation Analysis for Rumour detection -- identifies salient tweets that explain model predictions and show strong agreement with human judgements for critical tweets determining the truthfulness of stories. CMA-R can further highlight causally impactful words in the salient tweets, providing another layer of interpretability and transparency into these blackbox rumour detection systems. Code is available at: this https URL .
- [1383] arXiv:2402.08156 (cross-list from cs.LG) [ pdf , ps , html , other ]
-
Title: Group Decision-Making among Privacy-Aware AgentsSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Cryptography and Security (cs.CR); Multiagent Systems (cs.MA); Machine Learning (stat.ML)
Abstract: How can individuals exchange information to learn from each other despite their privacy needs and security concerns? For example, consider individuals deliberating a contentious topic and being concerned about divulging their private experiences. Preserving individual privacy and enabling efficient social learning are both important desiderata but seem fundamentally at odds with each other and very hard to reconcile. We do so by controlling information leakage using rigorous statistical guarantees that are based on differential privacy (DP). Our agents use log-linear rules to update their beliefs after communicating with their neighbors. Adding DP randomization noise to beliefs provides communicating agents with plausible deniability with regard to their private information and their network neighborhoods. We consider two learning environments one for distributed maximum-likelihood estimation given a finite number of private signals and another for online learning from an infinite, intermittent signal stream. Noisy information aggregation in the finite case leads to interesting tradeoffs between rejecting low-quality states and making sure all high-quality states are accepted in the algorithm output. Our results flesh out the nature of the trade-offs in both cases between the quality of the group decision outcomes, learning accuracy, communication cost, and the level of privacy protections that the agents are afforded.
- [1384] arXiv:2402.08164 (cross-list from stat.ML) [ pdf , ps , html , other ]
-
Title: On Limitations of the Transformer ArchitectureSubjects: Machine Learning (stat.ML) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: What are the root causes of hallucinations in large language models (LLMs)? We use Communication Complexity to prove that the Transformer layer is incapable of composing functions (e.g., identify a grandparent of a person in a genealogy) if the domains of the functions are large enough; we show through examples that this inability is already empirically present when the domains are quite small. We also point out that several mathematical tasks that are at the core of the so-called compositional tasks thought to be hard for LLMs are unlikely to be solvable by Transformers, for large enough instances and assuming that certain well accepted conjectures in the field of Computational Complexity are true.
- [1385] arXiv:2402.08170 (cross-list from cs.LG) [ pdf , ps , html , other ]
-
Title: LLaGA: Large Language and Graph AssistantSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Abstract: Graph Neural Networks (GNNs) have empowered the advance in graph-structured data analysis. Recently, the rise of Large Language Models (LLMs) like GPT-4 has heralded a new era in deep learning. However, their application to graph data poses distinct challenges due to the inherent difficulty of translating graph structures to language. To this end, we introduce the Large Language and Graph Assistant (LLaGA), an innovative model that effectively integrates LLM capabilities to handle the complexities of graph-structured data. LLaGA retains the general-purpose nature of LLMs while adapting graph data into a format compatible with LLM input. LLaGA achieves this by reorganizing graph nodes to structure-aware sequences and then mapping these into the token embedding space through a versatile projector. LLaGA excels in versatility, generalizability and interpretability, allowing it to perform consistently well across different datasets and tasks, extend its ability to unseen datasets or tasks, and provide explanations for graphs. Our extensive experiments across popular graph benchmarks show that LLaGA delivers outstanding performance across four datasets and three tasks using one single model, surpassing state-of-the-art graph models in both supervised and zero-shot scenarios. Our code is available at \url{ this https URL }.
- [1386] arXiv:2402.08191 (cross-list from cs.RO) [ pdf , ps , html , other ]
-
Title: THE COLOSSEUM: A Benchmark for Evaluating Generalization for Robotic ManipulationComments: 30 pagesSubjects: Robotics (cs.RO) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: To realize effective large-scale, real-world robotic applications, we must evaluate how well our robot policies adapt to changes in environmental conditions. Unfortunately, a majority of studies evaluate robot performance in environments closely resembling or even identical to the training setup. We present THE COLOSSEUM, a novel simulation benchmark, with 20 diverse manipulation tasks, that enables systematical evaluation of models across 12 axes of environmental perturbations. These perturbations include changes in color, texture, and size of objects, table-tops, and backgrounds; we also vary lighting, distractors, and camera pose. Using THE COLOSSEUM, we compare 4 state-of-the-art manipulation models to reveal that their success rate degrades between 30-50% across these perturbation factors. When multiple perturbations are applied in unison, the success rate degrades $\geq$75%. We identify that changing the number of distractor objects, target object color, or lighting conditions are the perturbations that reduce model performance the most. To verify the ecological validity of our results, we show that our results in simulation are correlated ($\bar{R}^2 = 0.614$) to similar perturbations in real-world experiments. We open source code for others to use THE COLOSSEUM, and also release code to 3D print the objects used to replicate the real-world perturbations. Ultimately, we hope that THE COLOSSEUM will serve as a benchmark to identify modeling decisions that systematically improve generalization for manipulation. See this https URL for more details.
- [1387] arXiv:2402.08198 (cross-list from q-bio.BM) [ pdf , ps , html , other ]
-
Title: PSC-CPI: Multi-Scale Protein Sequence-Structure Contrasting for Efficient and Generalizable Compound-Protein Interaction PredictionLirong Wu , Yufei Huang , Cheng Tan , Zhangyang Gao , Bozhen Hu , Haitao Lin , Zicheng Liu , Stan Z. LiSubjects: Biomolecules (q-bio.BM) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: Compound-Protein Interaction (CPI) prediction aims to predict the pattern and strength of compound-protein interactions for rational drug discovery. Existing deep learning-based methods utilize only the single modality of protein sequences or structures and lack the co-modeling of the joint distribution of the two modalities, which may lead to significant performance drops in complex real-world scenarios due to various factors, e.g., modality missing and domain shifting. More importantly, these methods only model protein sequences and structures at a single fixed scale, neglecting more fine-grained multi-scale information, such as those embedded in key protein fragments. In this paper, we propose a novel multi-scale Protein Sequence-structure Contrasting framework for CPI prediction (PSC-CPI), which captures the dependencies between protein sequences and structures through both intra-modality and cross-modality contrasting. We further apply length-variable protein augmentation to allow contrasting to be performed at different scales, from the amino acid level to the sequence level. Finally, in order to more fairly evaluate the model generalizability, we split the test data into four settings based on whether compounds and proteins have been observed during the training stage. Extensive experiments have shown that PSC-CPI generalizes well in all four settings, particularly in the more challenging ``Unseen-Both" setting, where neither compounds nor proteins have been observed during training. Furthermore, even when encountering a situation of modality missing, i.e., inference with only single-modality protein data, PSC-CPI still exhibits comparable or even better performance than previous approaches.
- [1388] arXiv:2402.08209 (cross-list from cs.LG) [ pdf , ps , html , other ]
-
Title: Thresholding Data Shapley for Data Cleansing Using Multi-Armed BanditsSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Abstract: Data cleansing aims to improve model performance by removing a set of harmful instances from the training dataset. Data Shapley is a common theoretically guaranteed method to evaluate the contribution of each instance to model performance; however, it requires training on all subsets of the training data, which is computationally expensive. In this paper, we propose an iterativemethod to fast identify a subset of instances with low data Shapley values by using the thresholding bandit algorithm. We provide a theoretical guarantee that the proposed method can accurately select harmful instances if a sufficiently large number of iterations is conducted. Empirical evaluation using various models and datasets demonstrated that the proposed method efficiently improved the computational speed while maintaining the model performance.
- [1389] arXiv:2402.08219 (cross-list from cs.CL) [ pdf , ps , html , other ]
-
Title: BBox-Adapter: Lightweight Adapting for Black-Box Large Language ModelsComments: 24 pages, 10 figuresSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: Adapting state-of-the-art Large Language Models (LLMs) like GPT-4 and Gemini for specific tasks is challenging. Due to the opacity in their parameters, embeddings, and even output probabilities, existing fine-tuning adaptation methods are inapplicable. Consequently, adapting these black-box LLMs is only possible through their API services, raising concerns about transparency, privacy, and cost. To address these challenges, we introduce BBox-Adapter, a novel lightweight adapter for black-box LLMs. BBox-Adapter distinguishes target and source domain data by treating target data as positive and source data as negative. It employs a ranking-based Noise Contrastive Estimation (NCE) loss to promote the likelihood of target domain data while penalizing that of the source domain. Furthermore, it features an online adaptation mechanism, which incorporates real-time positive data sampling from ground-truth, human, or AI feedback, coupled with negative data from previous adaptations. Extensive experiments demonstrate BBox-Adapter's effectiveness and cost efficiency. It improves model performance by up to 6.77% across diverse tasks and domains, while reducing training and inference costs by 31.30x and 1.84x, respectively.
- [1390] arXiv:2402.08228 (cross-list from cs.LG) [ pdf , ps , html , other ]
-
Title: Investigating Out-of-Distribution Generalization of GNNs: An Architecture PerspectiveSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Abstract: Graph neural networks (GNNs) have exhibited remarkable performance under the assumption that test data comes from the same distribution of training data. However, in real-world scenarios, this assumption may not always be valid. Consequently, there is a growing focus on exploring the Out-of-Distribution (OOD) problem in the context of graphs. Most existing efforts have primarily concentrated on improving graph OOD generalization from two \textbf{model-agnostic} perspectives: data-driven methods and strategy-based learning. However, there has been limited attention dedicated to investigating the impact of well-known \textbf{GNN model architectures} on graph OOD generalization, which is orthogonal to existing research. In this work, we provide the first comprehensive investigation of OOD generalization on graphs from an architecture perspective, by examining the common building blocks of modern GNNs. Through extensive experiments, we reveal that both the graph self-attention mechanism and the decoupled architecture contribute positively to graph OOD generalization. In contrast, we observe that the linear classification layer tends to compromise graph OOD generalization capability. Furthermore, we provide in-depth theoretical insights and discussions to underpin these discoveries. These insights have empowered us to develop a novel GNN backbone model, DGAT, designed to harness the robust properties of both graph self-attention mechanism and the decoupled architecture. Extensive experimental results demonstrate the effectiveness of our model under graph OOD, exhibiting substantial and consistent enhancements across various training strategies.
- [1391] arXiv:2402.08246 (cross-list from eess.SY) [ pdf , ps , other ]
-
Title: Ant Colony Optimization for Cooperative Inspection Path Planning Using Multiple Unmanned Aerial VehiclesComments: Published in: 2024 IEEE/SICE International Symposium on System Integration (SII)Subjects: Systems and Control (eess.SY) ; Artificial Intelligence (cs.AI)
Abstract: This paper presents a new swarm intelligence-based approach to deal with the cooperative path planning problem of unmanned aerial vehicles (UAVs), which is essential for the automatic inspection of infrastructure. The approach uses a 3D model of the structure to generate viewpoints for the UAVs. The calculation of the viewpoints considers the constraints related to the UAV formation model, camera parameters, and requirements for data post-processing. The viewpoints are then used as input to formulate the path planning as an extended traveling salesman problem and the definition of a new cost function. Ant colony optimization is finally used to solve the problem to yield optimal inspection paths. Experiments with 3D models of real structures have been conducted to evaluate the performance of the proposed approach. The results show that our system is not only capable of generating feasible inspection paths for UAVs but also reducing the path length by 29.47\% for complex structures when compared with another heuristic approach. The source code of the algorithm can be found at this https URL .
- [1392] arXiv:2402.08255 (cross-list from cs.LG) [ pdf , ps , other ]
-
Title: Distal Interference: Exploring the Limits of Model-Based Continual LearningSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Neural and Evolutionary Computing (cs.NE)
Abstract: Continual learning is the sequential learning of different tasks by a machine learning model. Continual learning is known to be hindered by catastrophic interference or forgetting, i.e. rapid unlearning of earlier learned tasks when new tasks are learned. Despite their practical success, artificial neural networks (ANNs) are prone to catastrophic interference. This study analyses how gradient descent and overlapping representations between distant input points lead to distal interference and catastrophic interference. Distal interference refers to the phenomenon where training a model on a subset of the domain leads to non-local changes on other subsets of the domain. This study shows that uniformly trainable models without distal interference must be exponentially large. A novel antisymmetric bounded exponential layer B-spline ANN architecture named ABEL-Spline is proposed that can approximate any continuous function, is uniformly trainable, has polynomial computational complexity, and provides some guarantees for distal interference. Experiments are presented to demonstrate the theoretical properties of ABEL-Splines. ABEL-Splines are also evaluated on benchmark regression problems. It is concluded that the weaker distal interference guarantees in ABEL-Splines are insufficient for model-only continual learning. It is conjectured that continual learning with polynomial complexity models requires augmentation of the training data or algorithm.
- [1393] arXiv:2402.08256 (cross-list from cs.IR) [ pdf , ps , other ]
-
Title: Modeling Balanced Explicit and Implicit Relations with Contrastive Learning for Knowledge Concept Recommendation in MOOCsComments: Accepted to WWW 2024Subjects: Information Retrieval (cs.IR) ; Artificial Intelligence (cs.AI)
Abstract: The knowledge concept recommendation in Massive Open Online Courses (MOOCs) is a significant issue that has garnered widespread attention. Existing methods primarily rely on the explicit relations between users and knowledge concepts on the MOOC platforms for recommendation. However, there are numerous implicit relations (e.g., shared interests or same knowledge levels between users) generated within the users' learning activities on the MOOC platforms. Existing methods fail to consider these implicit relations, and these relations themselves are difficult to learn and represent, causing poor performance in knowledge concept recommendation and an inability to meet users' personalized needs. To address this issue, we propose a novel framework based on contrastive learning, which can represent and balance the explicit and implicit relations for knowledge concept recommendation in MOOCs (CL-KCRec). Specifically, we first construct a MOOCs heterogeneous information network (HIN) by modeling the data from the MOOC platforms. Then, we utilize a relation-updated graph convolutional network and stacked multi-channel graph neural network to represent the explicit and implicit relations in the HIN, respectively. Considering that the quantity of explicit relations is relatively fewer compared to implicit relations in MOOCs, we propose a contrastive learning with prototypical graph to enhance the representations of both relations to capture their fruitful inherent relational knowledge, which can guide the propagation of students' preferences within the HIN. Based on these enhanced representations, to ensure the balanced contribution of both towards the final recommendation, we propose a dual-head attention mechanism for balanced fusion. Experimental results demonstrate that CL-KCRec outperforms several state-of-the-art baselines on real-world datasets in terms of HR, NDCG and MRR.
- [1394] arXiv:2402.08267 (cross-list from cs.CV) [ pdf , ps , other ]
-
Title: Improving Image Coding for Machines through Optimizing Encoder via Auxiliary LossComments: This version has been removed by arXiv administrators as the submitter did not have the right to agree to the license at the time of submissionSubjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI)
Abstract: Image coding for machines (ICM) aims to compress images for machine analysis using recognition models rather than human vision. Hence, in ICM, it is important for the encoder to recognize and compress the information necessary for the machine recognition task. There are two main approaches in learned ICM; optimization of the compression model based on task loss, and Region of Interest (ROI) based bit allocation. These approaches provide the encoder with the recognition capability. However, optimization with task loss becomes difficult when the recognition model is deep, and ROI-based methods often involve extra overhead during evaluation. In this study, we propose a novel training method for learned ICM models that applies auxiliary loss to the encoder to improve its recognition capability and rate-distortion performance. Our method achieves Bjontegaard Delta rate improvements of 27.7% and 20.3% in object detection and semantic segmentation tasks, compared to the conventional training method.
- [1395] arXiv:2402.08290 (cross-list from cs.LG) [ pdf , ps , html , other ]
-
Title: The Effect of Data Poisoning on Counterfactual ExplanationsSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Abstract: Counterfactual explanations provide a popular method for analyzing the predictions of black-box systems, and they can offer the opportunity for computational recourse by suggesting actionable changes on how to change the input to obtain a different (i.e. more favorable) system output. However, recent work highlighted their vulnerability to different types of manipulations. This work studies the vulnerability of counterfactual explanations to data poisoning. We formalize data poisoning in the context of counterfactual explanations for increasing the cost of recourse on three different levels: locally for a single instance, or a sub-group of instances, or globally for all instances. We demonstrate that state-of-the-art counterfactual generation methods \& toolboxes are vulnerable to such data poisoning.
- [1396] arXiv:2402.08303 (cross-list from cs.CL) [ pdf , ps , html , other ]
-
Title: ChatCell: Facilitating Single-Cell Analysis with Natural LanguageYin Fang , Kangwei Liu , Ningyu Zhang , Xinle Deng , Penghui Yang , Zhuo Chen , Xiangru Tang , Mark Gerstein , Xiaohui Fan , Huajun ChenComments: I have decided to temporarily withdraw this draft as I am in the process of making further revisions to improve its content. Code: this https URL Dataset: this https URL Demo: this https URLSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI); Computational Engineering, Finance, and Science (cs.CE); Human-Computer Interaction (cs.HC); Machine Learning (cs.LG)
Abstract: As Large Language Models (LLMs) rapidly evolve, their influence in science is becoming increasingly prominent. The emerging capabilities of LLMs in task generalization and free-form dialogue can significantly advance fields like chemistry and biology. However, the field of single-cell biology, which forms the foundational building blocks of living organisms, still faces several challenges. High knowledge barriers and limited scalability in current methods restrict the full exploitation of LLMs in mastering single-cell data, impeding direct accessibility and rapid iteration. To this end, we introduce ChatCell, which signifies a paradigm shift by facilitating single-cell analysis with natural language. Leveraging vocabulary adaptation and unified sequence generation, ChatCell has acquired profound expertise in single-cell biology and the capability to accommodate a diverse range of analysis tasks. Extensive experiments further demonstrate ChatCell's robust performance and potential to deepen single-cell insights, paving the way for more accessible and intuitive exploration in this pivotal field. Our project homepage is available at this https URL .
- [1397] arXiv:2402.08310 (cross-list from cs.CV) [ pdf , ps , other ]
-
Title: One-to-many Reconstruction of 3D Geometry of cultural Artifacts using a synthetically trained Generative ModelJournal-ref: 21st Eurographics Workshop on Graphics and Cultural Heritage (GCH 2023)Subjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI)
Abstract: Estimating the 3D shape of an object using a single image is a difficult problem. Modern approaches achieve good results for general objects, based on real photographs, but worse results on less expressive representations such as historic sketches. Our automated approach generates a variety of detailed 3D representation from a single sketch, depicting a medieval statue, and can be guided by multi-modal inputs, such as text prompts. It relies solely on synthetic data for training, making it adoptable even in cases of only small numbers of training examples. Our solution allows domain experts such as a curators to interactively reconstruct potential appearances of lost artifacts.
- [1398] arXiv:2402.08323 (cross-list from cs.CY) [ pdf , ps , other ]
-
Title: Mapping the Ethics of Generative AI: A Comprehensive Scoping ReviewSubjects: Computers and Society (cs.CY) ; Artificial Intelligence (cs.AI)
Abstract: The advent of generative artificial intelligence and the widespread adoption of it in society engendered intensive debates about its ethical implications and risks. These risks often differ from those associated with traditional discriminative machine learning. To synthesize the recent discourse and map its normative concepts, we conducted a scoping review on the ethics of generative artificial intelligence, including especially large language models and text-to-image models. Our analysis provides a taxonomy of 378 normative issues in 19 topic areas and ranks them according to their prevalence in the literature. The study offers a comprehensive overview for scholars, practitioners, or policymakers, condensing the ethical debates surrounding fairness, safety, harmful content, hallucinations, privacy, interaction risks, security, alignment, societal impacts, and others. We discuss the results, evaluate imbalances in the literature, and explore unsubstantiated risk scenarios.
- [1399] arXiv:2402.08324 (cross-list from cs.LG) [ pdf , ps , other ]
-
Title: Uncertainty Quantification via Stable Distribution PropagationFelix Petersen , Aashwin Mishra , Hilde Kuehne , Christian Borgelt , Oliver Deussen , Mikhail YurochkinComments: Published at ICLR 2024, Code @ this https URLSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Abstract: We propose a new approach for propagating stable probability distributions through neural networks. Our method is based on local linearization, which we show to be an optimal approximation in terms of total variation distance for the ReLU non-linearity. This allows propagating Gaussian and Cauchy input uncertainties through neural networks to quantify their output uncertainties. To demonstrate the utility of propagating distributions, we apply the proposed method to predicting calibrated confidence intervals and selective prediction on out-of-distribution data. The results demonstrate a broad applicability of propagating distributions and show the advantages of our method over other approaches such as moment matching.
- [1400] arXiv:2402.08341 (cross-list from cs.CL) [ pdf , ps , other ]
-
Title: Eliciting Personality Traits in Large Language ModelsComments: Manuscript submitted to ACM Facct. Authors One and Two contributed equally to this workSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI)
Abstract: Large Language Models (LLMs) are increasingly being utilized by both candidates and employers in the recruitment context. However, with this comes numerous ethical concerns, particularly related to the lack of transparency in these "black-box" models. Although previous studies have sought to increase the transparency of these models by investigating the personality traits of LLMs, many of the previous studies have provided them with personality assessments to complete. On the other hand, this study seeks to obtain a better understanding of such models by examining their output variations based on different input prompts. Specifically, we use a novel elicitation approach using prompts derived from common interview questions, as well as prompts designed to elicit particular Big Five personality traits to examine whether the models were susceptible to trait-activation like humans are, to measure their personality based on the language used in their outputs. To do so, we repeatedly prompted multiple LMs with different parameter sizes, including Llama-2, Falcon, Mistral, Bloom, GPT, OPT, and XLNet (base and fine tuned versions) and examined their personality using classifiers trained on the myPersonality dataset. Our results reveal that, generally, all LLMs demonstrate high openness and low extraversion. However, whereas LMs with fewer parameters exhibit similar behaviour in personality traits, newer and LMs with more parameters exhibit a broader range of personality traits, with increased agreeableness, emotional stability, and openness. Furthermore, a greater number of parameters is positively associated with openness and conscientiousness. Moreover, fine-tuned models exhibit minor modulations in their personality traits, contingent on the dataset. Implications and directions for future research are discussed.
- [1401] arXiv:2402.08349 (cross-list from cs.DB) [ pdf , ps , other ]
-
Title: Evaluating the Data Model Robustness of Text-to-SQL Systems Based on Real User QueriesSubjects: Databases (cs.DB) ; Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
Abstract: Text-to-SQL systems (also known as NL-to-SQL systems) have become an increasingly popular solution for bridging the gap between user capabilities and SQL-based data access. These systems translate user requests in natural language to valid SQL statements for a specific database. Recent Text-to-SQL systems have benefited from the rapid improvement of transformer-based language models. However, while Text-to-SQL systems that incorporate such models continuously reach new high scores on -- often synthetic -- benchmark datasets, a systematic exploration of their robustness towards different data models in a real-world, realistic scenario is notably missing. This paper provides the first in-depth evaluation of the data model robustness of Text-to-SQL systems in practice based on a multi-year international project focused on Text-to-SQL interfaces. Our evaluation is based on a real-world deployment of FootballDB, a system that was deployed over a 9 month period in the context of the FIFA World Cup 2022, during which about 6K natural language questions were asked and executed. All of our data is based on real user questions that were asked live to the system. We manually labeled and translated a subset of these questions for three different data models. For each data model, we explore the performance of representative Text-to-SQL systems and language models. We further quantify the impact of training data size, pre-, and post-processing steps as well as language model inference time. Our comprehensive evaluation sheds light on the design choices of real-world Text-to-SQL systems and their impact on moving from research prototypes to real deployments. Last, we provide a new benchmark dataset to the community, which is the first to enable the evaluation of different data models for the same dataset and is substantially more challenging than most previous datasets in terms of query complexity.
- [1402] arXiv:2402.08373 (cross-list from cs.LG) [ pdf , ps , other ]
-
Title: Time-Series Classification for Dynamic Strategies in Multi-Step ForecastingSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Abstract: Multi-step forecasting (MSF) in time-series, the ability to make predictions multiple time steps into the future, is fundamental to almost all temporal domains. To make such forecasts, one must assume the recursive complexity of the temporal dynamics. Such assumptions are referred to as the forecasting strategy used to train a predictive model. Previous work shows that it is not clear which forecasting strategy is optimal a priori to evaluating on unseen data. Furthermore, current approaches to MSF use a single (fixed) forecasting strategy.
In this paper, we characterise the instance-level variance of optimal forecasting strategies and propose Dynamic Strategies (DyStrat) for MSF. We experiment using 10 datasets from different scales, domains, and lengths of multi-step horizons. When using a random-forest-based classifier, DyStrat outperforms the best fixed strategy, which is not knowable a priori, 94% of the time, with an average reduction in mean-squared error of 11%. Our approach typically triples the top-1 accuracy compared to current approaches. Notably, we show DyStrat generalises well for any MSF task. - [1403] arXiv:2402.08383 (cross-list from cs.LG) [ pdf , ps , other ]
-
Title: Uncertainty Quantification for Forward and Inverse Problems of PDEs via Latent Global EvolutionComments: Accepted by AAAI 2024 (Oral)Subjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Abstract: Deep learning-based surrogate models have demonstrated remarkable advantages over classical solvers in terms of speed, often achieving speedups of 10 to 1000 times over traditional partial differential equation (PDE) solvers. However, a significant challenge hindering their widespread adoption in both scientific and industrial domains is the lack of understanding about their prediction uncertainties, particularly in scenarios that involve critical decision making. To address this limitation, we propose a method that integrates efficient and precise uncertainty quantification into a deep learning-based surrogate model. Our method, termed Latent Evolution of PDEs with Uncertainty Quantification (LE-PDE-UQ), endows deep learning-based surrogate models with robust and efficient uncertainty quantification capabilities for both forward and inverse problems. LE-PDE-UQ leverages latent vectors within a latent space to evolve both the system's state and its corresponding uncertainty estimation. The latent vectors are decoded to provide predictions for the system's state as well as estimates of its uncertainty. In extensive experiments, we demonstrate the accurate uncertainty quantification performance of our approach, surpassing that of strong baselines including deep ensembles, Bayesian neural network layers, and dropout. Our method excels at propagating uncertainty over extended auto-regressive rollouts, making it suitable for scenarios involving long-term predictions. Our code is available at: this https URL .
- [1404] arXiv:2402.08384 (cross-list from cs.LG) [ pdf , ps , other ]
-
Title: Selective Learning: Towards Robust Calibration with Dynamic RegularizationZongbo Han , Yifeng Yang , Changqing Zhang , Linjun Zhang , Joey Tianyi Zhou , Qinghua Hu , Huaxiu YaoSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Abstract: Miscalibration in deep learning refers to there is a discrepancy between the predicted confidence and performance. This problem usually arises due to the overfitting problem, which is characterized by learning everything presented in the training set, resulting in overconfident predictions during testing. Existing methods typically address overfitting and mitigate the miscalibration by adding a maximum-entropy regularizer to the objective function. The objective can be understood as seeking a model that fits the ground-truth labels by increasing the confidence while also maximizing the entropy of predicted probabilities by decreasing the confidence. However, previous methods lack clear guidance on confidence adjustment, leading to conflicting objectives (increasing but also decreasing confidence). Therefore, we introduce a method called Dynamic Regularization (DReg), which aims to learn what should be learned during training thereby circumventing the confidence adjusting trade-off. At a high level, DReg aims to obtain a more reliable model capable of acknowledging what it knows and does not know. Specifically, DReg effectively fits the labels for in-distribution samples (samples that should be learned) while applying regularization dynamically to samples beyond model capabilities (e.g., outliers), thereby obtaining a robust calibrated model especially on the samples beyond model capabilities. Both theoretical and empirical analyses sufficiently demonstrate the superiority of DReg compared with previous methods.
- [1405] arXiv:2402.08401 (cross-list from cs.LG) [ pdf , ps , other ]
-
Title: LOSS-GAT: Label Propagation and One-Class Semi-Supervised Graph Attention Network for Fake News DetectionSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Social and Information Networks (cs.SI)
Abstract: In the era of widespread social networks, the rapid dissemination of fake news has emerged as a significant threat, inflicting detrimental consequences across various dimensions of people's lives. Machine learning and deep learning approaches have been extensively employed for identifying fake news. However, a significant challenge in identifying fake news is the limited availability of labeled news datasets. Therefore, the One-Class Learning (OCL) approach, utilizing only a small set of labeled data from the interest class, can be a suitable approach to address this challenge. On the other hand, representing data as a graph enables access to diverse content and structural information, and label propagation methods on graphs can be effective in predicting node labels. In this paper, we adopt a graph-based model for data representation and introduce a semi-supervised and one-class approach for fake news detection, called LOSS-GAT. Initially, we employ a two-step label propagation algorithm, utilizing Graph Neural Networks (GNNs) as an initial classifier to categorize news into two groups: interest (fake) and non-interest (real). Subsequently, we enhance the graph structure using structural augmentation techniques. Ultimately, we predict the final labels for all unlabeled data using a GNN that induces randomness within the local neighborhood of nodes through the aggregation function. We evaluate our proposed method on five common datasets and compare the results against a set of baseline models, including both OCL and binary labeled models. The results demonstrate that LOSS-GAT achieves a notable improvement, surpassing 10%, with the advantage of utilizing only a limited set of labeled fake news. Noteworthy, LOSS-GAT even outperforms binary labeled models.
- [1406] arXiv:2402.08409 (cross-list from cs.CV) [ pdf , ps , other ]
-
Title: Transferring Ultrahigh-Field Representations for Intensity-Guided Brain Segmentation of Low-Field Magnetic Resonance ImagingComments: 32 pages, 9 figures, and 5 tablesSubjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI)
Abstract: Ultrahigh-field (UHF) magnetic resonance imaging (MRI), i.e., 7T MRI, provides superior anatomical details of internal brain structures owing to its enhanced signal-to-noise ratio and susceptibility-induced contrast. However, the widespread use of 7T MRI is limited by its high cost and lower accessibility compared to low-field (LF) MRI. This study proposes a deep-learning framework that systematically fuses the input LF magnetic resonance feature representations with the inferred 7T-like feature representations for brain image segmentation tasks in a 7T-absent environment. Specifically, our adaptive fusion module aggregates 7T-like features derived from the LF image by a pre-trained network and then refines them to be effectively assimilable UHF guidance into LF image features. Using intensity-guided features obtained from such aggregation and assimilation, segmentation models can recognize subtle structural representations that are usually difficult to recognize when relying only on LF features. Beyond such advantages, this strategy can seamlessly be utilized by modulating the contrast of LF features in alignment with UHF guidance, even when employing arbitrary segmentation models. Exhaustive experiments demonstrated that the proposed method significantly outperformed all baseline models on both brain tissue and whole-brain segmentation tasks; further, it exhibited remarkable adaptability and scalability by successfully integrating diverse segmentation models and tasks. These improvements were not only quantifiable but also visible in the superlative visual quality of segmentation masks.
- [1407] arXiv:2402.08429 (cross-list from cs.OH) [ pdf , ps , other ]
-
Title: Is 3-(F)WL Enough to Distinguish All 3D Graphs?Subjects: Other Computer Science (cs.OH) ; Artificial Intelligence (cs.AI)
Abstract: The problem of graph isomorphism is an important but challenging problem in the field of graph analysis, for example: analyzing the similarity of two chemical molecules, or studying the expressive ability of graph neural networks. WL test is a method to judge whether two graphs are isomorphic, but it cannot distinguish all non-isomorphic graphs. As an improvement of WL, k-WL has stronger isomorphism discrimination ability, and as k increases, its discrimination ability is strictly increasing. However, whether the isomorphic discrimination power of k-WL is strictly increasing for more complex 3D graphs, or whether there exists k that can discriminate all 3D graphs, remains unexplored. This paper attempts to explore this problem from the perspective of graph generation.
- [1408] arXiv:2402.08430 (cross-list from cs.SE) [ pdf , ps , other ]
-
Title: Analyzing Prompt Influence on Automated Method Generation: An Empirical Study with CopilotJournal-ref: Proceedings of the 32nd IEEE/ACM International Conference on Program Comprehension (ICPC 2024)Subjects: Software Engineering (cs.SE) ; Artificial Intelligence (cs.AI)
Abstract: Generative AI is changing the way developers interact with software systems, providing services that can produce and deliver new content, crafted to satisfy the actual needs of developers. For instance, developers can ask for new code directly from within their IDEs by writing natural language prompts, and integrated services based on generative AI, such as Copilot, immediately respond to prompts by providing ready-to-use code snippets. Formulating the prompt appropriately, and incorporating the useful information while avoiding any information overload, can be an important factor in obtaining the right piece of code. The task of designing good prompts is known as prompt engineering. In this paper, we systematically investigate the influence of eight prompt features on the style and the content of prompts, on the level of correctness, complexity, size, and similarity to the developers' code of the generated code. We specifically consider the task of using Copilot with 124,800 prompts obtained by systematically combining the eight considered prompt features to generate the implementation of 200 Java methods. Results show how some prompt features, such as the presence of examples and the summary of the purpose of the method, can significantly influence the quality of the result.
- [1409] arXiv:2402.08470 (cross-list from cs.LG) [ pdf , ps , other ]
-
Title: Parallel-friendly Spatio-Temporal Graph Learning for Photovoltaic Degradation Analysis at ScaleSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Distributed, Parallel, and Cluster Computing (cs.DC)
Abstract: We propose a novel Spatio-Temporal Graph Neural Network empowered trend analysis approach (ST-GTrend) to perform fleet-level performance degradation analysis for Photovoltaic (PV) power networks. PV power stations have become an integral component to the global sustainable energy production landscape. Accurately estimating the performance of PV systems is critical to their feasibility as a power generation technology and as a financial asset. One of the most challenging problems in assessing the Levelized Cost of Energy (LCOE) of a PV system is to understand and estimate the long-term Performance Loss Rate (PLR) for large fleets of PV inverters. ST-GTrend integrates spatio-temporal coherence and graph attention to separate PLR as a long-term "aging" trend from multiple fluctuation terms in the PV input data. To cope with diverse degradation patterns in timeseries, ST-GTrend adopts a paralleled graph autoencoder array to extract aging and fluctuation terms simultaneously. ST-GTrend imposes flatness and smoothness regularization to ensure the disentanglement between aging and fluctuation. To scale the analysis to large PV systems, we also introduce Para-GTrend, a parallel algorithm to accelerate the training and inference of ST-GTrend. We have evaluated ST-GTrend on three large-scale PV datasets, spanning a time period of 10 years. Our results show that ST-GTrend reduces Mean Absolute Percent Error (MAPE) and Euclidean Distances by 34.74% and 33.66% compared to the SOTA methods. Our results demonstrate that Para-GTrend can speed up ST-GTrend by up to 7.92 times. We further verify the generality and effectiveness of ST-GTrend for trend analysis using financial and economic datasets.
- [1410] arXiv:2402.08473 (cross-list from cs.CV) [ pdf , ps , html , other ]
-
Title: Intriguing Differences Between Zero-Shot and Systematic Evaluations of Vision-Language Transformer ModelsComments: 30 pages, 30 figuresSubjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: Transformer-based models have dominated natural language processing and other areas in the last few years due to their superior (zero-shot) performance on benchmark datasets. However, these models are poorly understood due to their complexity and size. While probing-based methods are widely used to understand specific properties, the structures of the representation space are not systematically characterized; consequently, it is unclear how such models generalize and overgeneralize to new inputs beyond datasets. In this paper, based on a new gradient descent optimization method, we are able to explore the embedding space of a commonly used vision-language model. Using the Imagenette dataset, we show that while the model achieves over 99\% zero-shot classification performance, it fails systematic evaluations completely. Using a linear approximation, we provide a framework to explain the striking differences. We have also obtained similar results using a different model to support that our results are applicable to other transformer models with continuous inputs. We also propose a robust way to detect the modified images.
- [1411] arXiv:2402.08491 (cross-list from cs.LG) [ pdf , ps , html , other ]
-
Title: Deep Reinforcement Learning for Controlled Traversing of the Attractor Landscape of Boolean Models in the Context of Cellular ReprogrammingSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Molecular Networks (q-bio.MN); Quantitative Methods (q-bio.QM)
Abstract: Cellular reprogramming can be used for both the prevention and cure of different diseases. However, the efficiency of discovering reprogramming strategies with classical wet-lab experiments is hindered by lengthy time commitments and high costs. In this study, we develop a novel computational framework based on deep reinforcement learning that facilitates the identification of reprogramming strategies. For this aim, we formulate a control problem in the context of cellular reprogramming for the frameworks of BNs and PBNs under the asynchronous update mode. Furthermore, we introduce the notion of a pseudo-attractor and a procedure for identification of pseudo-attractor state during training. Finally, we devise a computational framework for solving the control problem, which we test on a number of different models.
- [1412] arXiv:2402.08496 (cross-list from cs.CL) [ pdf , ps , other ]
-
Title: A Systematic Review of Data-to-Text NLGSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: This systematic review undertakes a comprehensive analysis of current research on data-to-text generation, identifying gaps, challenges, and future directions within the field. Relevant literature in this field on datasets, evaluation metrics, application areas, multilingualism, language models, and hallucination mitigation methods is reviewed. Various methods for producing high-quality text are explored, addressing the challenge of hallucinations in data-to-text generation. These methods include re-ranking, traditional and neural pipeline architecture, planning architectures, data cleaning, controlled generation, and modification of models and training techniques. Their effectiveness and limitations are assessed, highlighting the need for universally applicable strategies to mitigate hallucinations. The review also examines the usage, popularity, and impact of datasets, alongside evaluation metrics, with an emphasis on both automatic and human assessment. Additionally, the evolution of data-to-text models, particularly the widespread adoption of transformer models, is discussed. Despite advancements in text quality, the review emphasizes the importance of research in low-resourced languages and the engineering of datasets in these languages to promote inclusivity. Finally, several application domains of data-to-text are highlighted, emphasizing their relevance in such domains. Overall, this review serves as a guiding framework for fostering innovation and advancing data-to-text generation.
- [1413] arXiv:2402.08509 (cross-list from cs.DB) [ pdf , ps , other ]
-
Title: From Shapes to Shapes: Inferring SHACL Shapes for Results of SPARQL CONSTRUCT Queries (Extended Version)Comments: 19 pages, 5 figuresSubjects: Databases (cs.DB) ; Artificial Intelligence (cs.AI); Logic in Computer Science (cs.LO)
Abstract: SPARQL CONSTRUCT queries allow for the specification of data processing pipelines that transform given input graphs into new output graphs. It is now common to constrain graphs through SHACL shapes allowing users to understand which data they can expect and which not. However, it becomes challenging to understand what graph data can be expected at the end of a data processing pipeline without knowing the particular input data: Shape constraints on the input graph may affect the output graph, but may no longer apply literally, and new shapes may be imposed by the query template. In this paper, we study the derivation of shape constraints that hold on all possible output graphs of a given SPARQL CONSTRUCT query. We assume that the SPARQL CONSTRUCT query is fixed, e.g., being part of a program, whereas the input graphs adhere to input shape constraints but may otherwise vary over time and, thus, are mostly unknown. We study a fragment of SPARQL CONSTRUCT queries (SCCQ) and a fragment of SHACL (Simple SHACL). We formally define the problem of deriving the most restrictive set of Simple SHACL shapes that constrain the results from evaluating a SCCQ over any input graph restricted by a given set of Simple SHACL shapes. We propose and implement an algorithm that statically analyses input SHACL shapes and CONSTRUCT queries and prove its soundness and complexity.
- [1414] arXiv:2402.08530 (cross-list from cs.LG) [ pdf , ps , other ]
-
Title: A Distributional Analogue to the Successor RepresentationHarley Wiltzer , Jesse Farebrother , Arthur Gretton , Yunhao Tang , André Barreto , Will Dabney , Marc G. Bellemare , Mark RowlandSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
Abstract: This paper contributes a new approach for distributional reinforcement learning which elucidates a clean separation of transition structure and reward in the learning process. Analogous to how the successor representation (SR) describes the expected consequences of behaving according to a given policy, our distributional successor measure (SM) describes the distributional consequences of this behaviour. We formulate the distributional SM as a distribution over distributions and provide theory connecting it with distributional and model-based reinforcement learning. Moreover, we propose an algorithm that learns the distributional SM from data by minimizing a two-level maximum mean discrepancy. Key to our method are a number of algorithmic techniques that are independently valuable for learning generative models of state. As an illustration of the usefulness of the distributional SM, we show that it enables zero-shot risk-sensitive policy evaluation in a way that was not previously possible.
- [1415] arXiv:2402.08547 (cross-list from cs.GT) [ pdf , ps , html , other ]
-
Title: Dueling Over Dessert, Mastering the Art of Repeated Cake CuttingSubjects: Computer Science and Game Theory (cs.GT) ; Artificial Intelligence (cs.AI); Theoretical Economics (econ.TH)
Abstract: We consider the setting of repeated fair division between two players, denoted Alice and Bob, with private valuations over a cake. In each round, a new cake arrives, which is identical to the ones in previous rounds. Alice cuts the cake at a point of her choice, while Bob chooses the left piece or the right piece, leaving the remainder for Alice. We consider two versions: sequential, where Bob observes Alice's cut point before choosing left/right, and simultaneous, where he only observes her cut point after making his choice. The simultaneous version was first considered by Aumann and Maschler (1995).
We observe that if Bob is almost myopic and chooses his favorite piece too often, then he can be systematically exploited by Alice through a strategy akin to a binary search. This strategy allows Alice to approximate Bob's preferences with increasing precision, thereby securing a disproportionate share of the resource over time.
We analyze the limits of how much a player can exploit the other one and show that fair utility profiles are in fact achievable. Specifically, the players can enforce the equitable utility profile of $(1/2, 1/2)$ in the limit on every trajectory of play, by keeping the other player's utility to approximately $1/2$ on average while guaranteeing they themselves get at least approximately $1/2$ on average. We show this theorem using a connection with Blackwell approachability.
Finally, we analyze a natural dynamic known as fictitious play, where players best respond to the empirical distribution of the other player. We show that fictitious play converges to the equitable utility profile of $(1/2, 1/2)$ at a rate of $O(1/\sqrt{T})$. - [1416] arXiv:2402.08562 (cross-list from cs.CL) [ pdf , ps , html , other ]
-
Title: Higher Layers Need More LoRA ExpertsChongyang Gao , Kezhen Chen , Jinmeng Rao , Baochen Sun , Ruibo Liu , Daiyi Peng , Yawen Zhang , Xiaoyuan Guo , Jie Yang , VS SubrahmanianComments: The code is available at this https URLSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI)
Abstract: Parameter-efficient tuning (PEFT) techniques like low-rank adaptation (LoRA) offer training efficiency on Large Language Models, but their impact on model performance remains limited. Recent efforts integrate LoRA and Mixture-of-Experts (MoE) to improve the performance of PEFT methods. Despite promising results, research on improving the efficiency of LoRA with MoE is still in its early stages. Recent studies have shown that experts in the MoE architecture have different strengths and also exhibit some redundancy. Does this statement also apply to parameter-efficient MoE? In this paper, we introduce a novel parameter-efficient MoE method, \textit{\textbf{M}oE-L\textbf{o}RA with \textbf{L}ayer-wise Expert \textbf{A}llocation (MoLA)} for Transformer-based models, where each model layer has the flexibility to employ a varying number of LoRA experts. We investigate several architectures with varying layer-wise expert configurations. Experiments on six well-known NLP and commonsense QA benchmarks demonstrate that MoLA achieves equal or superior performance compared to all baselines. We find that allocating more LoRA experts to higher layers further enhances the effectiveness of models with a certain number of experts in total. With much fewer parameters, this allocation strategy outperforms the setting with the same number of experts in every layer. This work can be widely used as a plug-and-play parameter-efficient tuning approach for various applications. The code is available at this https URL .
- [1417] arXiv:2402.08570 (cross-list from cs.RO) [ pdf , ps , other ]
-
Title: Online Foundation Model Selection in RoboticsSubjects: Robotics (cs.RO) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: Foundation models have recently expanded into robotics after excelling in computer vision and natural language processing. The models are accessible in two ways: open-source or paid, closed-source options. Users with access to both face a problem when deciding between effective yet costly closed-source models and free but less powerful open-source alternatives. We call it the model selection problem. Existing supervised-learning methods are impractical due to the high cost of collecting extensive training data from closed-source models. Hence, we focus on the online learning setting where algorithms learn while collecting data, eliminating the need for large pre-collected datasets. We thus formulate a user-centric online model selection problem and propose a novel solution that combines an open-source encoder to output context and an online learning algorithm that processes this context. The encoder distills vast data distributions into low-dimensional features, i.e., the context, without additional training. The online learning algorithm aims to maximize a composite reward that includes model performance, execution time, and costs based on the context extracted from the data. It results in an improved trade-off between selecting open-source and closed-source models compared to non-contextual methods, as validated by our theoretical analysis. Experiments across language-based robotic tasks such as Waymo Open Dataset, ALFRED, and Open X-Embodiment demonstrate real-world applications of the solution. The results show that the solution significantly improves the task success rate by up to 14%.
- [1418] arXiv:2402.08578 (cross-list from cs.LG) [ pdf , ps , other ]
-
Title: FedLPS: Heterogeneous Federated Learning for Multiple Tasks with Local Parameter SharingComments: Accepted by AAAI 2024Subjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Distributed, Parallel, and Cluster Computing (cs.DC)
Abstract: Federated Learning (FL) has emerged as a promising solution in Edge Computing (EC) environments to process the proliferation of data generated by edge devices. By collaboratively optimizing the global machine learning models on distributed edge devices, FL circumvents the need for transmitting raw data and enhances user privacy. Despite practical successes, FL still confronts significant challenges including constrained edge device resources, multiple tasks deployment, and data heterogeneity. However, existing studies focus on mitigating the FL training costs of each single task whereas neglecting the resource consumption across multiple tasks in heterogeneous FL scenarios. In this paper, we propose Heterogeneous Federated Learning with Local Parameter Sharing (FedLPS) to fill this gap. FedLPS leverages principles from transfer learning to facilitate the deployment of multiple tasks on a single device by dividing the local model into a shareable encoder and task-specific encoders. To further reduce resource consumption, a channel-wise model pruning algorithm that shrinks the footprint of local models while accounting for both data and system heterogeneity is employed in FedLPS. Additionally, a novel heterogeneous model aggregation algorithm is proposed to aggregate the heterogeneous predictors in FedLPS. We implemented the proposed FedLPS on a real FL platform and compared it with state-of-the-art (SOTA) FL frameworks. The experimental results on five popular datasets and two modern DNN models illustrate that the proposed FedLPS significantly outperforms the SOTA FL frameworks by up to 4.88% and reduces the computational resource consumption by 21.3%. Our code is available at: this https URL .
- [1419] arXiv:2402.08582 (cross-list from cs.CV) [ pdf , ps , html , other ]
-
Title: FESS Loss: Feature-Enhanced Spatial Segmentation Loss for Optimizing Medical Image AnalysisComments: 5 Pages, 3 figuresSubjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI)
Abstract: Medical image segmentation is a critical process in the field of medical imaging, playing a pivotal role in diagnosis, treatment, and research. It involves partitioning of an image into multiple regions, representing distinct anatomical or pathological structures. Conventional methods often grapple with the challenge of balancing spatial precision and comprehensive feature representation due to their reliance on traditional loss functions. To overcome this, we propose Feature-Enhanced Spatial Segmentation Loss (FESS Loss), that integrates the benefits of contrastive learning (which extracts intricate features, particularly in the nuanced domain of medical imaging) with the spatial accuracy inherent in the Dice loss. The objective is to augment both spatial precision and feature-based representation in the segmentation of medical images. FESS Loss signifies a notable advancement, offering a more accurate and refined segmentation process, ultimately contributing to heightened precision in the analysis of medical images. Further, FESS loss demonstrates superior performance in limited annotated data availability scenarios often present in the medical domain.
- [1420] arXiv:2402.08593 (cross-list from cs.LG) [ pdf , ps , other ]
-
Title: Graph Feature Preprocessor: Real-time Extraction of Subgraph-based Features from Transaction GraphsJovan Blanuša , Maximo Cravero Baraja , Andreea Anghel , Luc von Niederhäusern , Erik Altman , Haris Pozidis , Kubilay AtasuSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Abstract: In this paper, we present "Graph Feature Preprocessor", a software library for detecting typical money laundering and fraud patterns in financial transaction graphs in real time. These patterns are used to produce a rich set of transaction features for downstream machine learning training and inference tasks such as money laundering detection. We show that our enriched transaction features dramatically improve the prediction accuracy of gradient-boosting-based machine learning models. Our library exploits multicore parallelism, maintains a dynamic in-memory graph, and efficiently mines subgraph patterns in the incoming transaction stream, which enables it to be operated in a streaming manner. We evaluate our library using highly-imbalanced synthetic anti-money laundering (AML) and real-life Ethereum phishing datasets. In these datasets, the proportion of illicit transactions is very small, which makes the learning process challenging. Our solution, which combines our Graph Feature Preprocessor and gradient-boosting-based machine learning models, is able to detect these illicit transactions with higher minority-class F1 scores than standard graph neural networks. In addition, the end-to-end throughput rate of our solution executed on a multicore CPU outperforms the graph neural network baselines executed on a powerful V100 GPU. Overall, the combination of high accuracy, a high throughput rate, and low latency of our solution demonstrates the practical value of our library in real-world applications. Graph Feature Preprocessor has been integrated into IBM mainframe software products, namely "IBM Cloud Pak for Data on Z" and "AI Toolkit for IBM Z and LinuxONE".
- [1421] arXiv:2402.08609 (cross-list from cs.LG) [ pdf , ps , html , other ]
-
Title: Mixtures of Experts Unlock Parameter Scaling for Deep RLJohan Obando-Ceron , Ghada Sokar , Timon Willi , Clare Lyle , Jesse Farebrother , Jakob Foerster , Gintare Karolina Dziugaite , Doina Precup , Pablo Samuel CastroSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Abstract: The recent rapid progress in (self) supervised learning models is in large part predicted by empirical scaling laws: a model's performance scales proportionally to its size. Analogous scaling laws remain elusive for reinforcement learning domains, however, where increasing the parameter count of a model often hurts its final performance. In this paper, we demonstrate that incorporating Mixture-of-Expert (MoE) modules, and in particular Soft MoEs (Puigcerver et al., 2023), into value-based networks results in more parameter-scalable models, evidenced by substantial performance increases across a variety of training regimes and model sizes. This work thus provides strong empirical evidence towards developing scaling laws for reinforcement learning.
- [1422] arXiv:2402.08611 (cross-list from cs.LG) [ pdf , ps , other ]
-
Title: A Cost-Sensitive Transformer Model for Prognostics Under Highly Imbalanced Industrial DataAli Beikmohammadi , Mohammad Hosein Hamian , Neda Khoeyniha , Tony Lindgren , Olof Steinert , Sindri MagnússonSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Abstract: The rapid influx of data-driven models into the industrial sector has been facilitated by the proliferation of sensor technology, enabling the collection of vast quantities of data. However, leveraging these models for failure detection and prognosis poses significant challenges, including issues like missing values and class imbalances. Moreover, the cost sensitivity associated with industrial operations further complicates the application of conventional models in this context. This paper introduces a novel cost-sensitive transformer model developed as part of a systematic workflow, which also integrates a hybrid resampler and a regression-based imputer. After subjecting our approach to rigorous testing using the APS failure dataset from Scania trucks and the SECOM dataset, we observed a substantial enhancement in performance compared to state-of-the-art methods. Moreover, we conduct an ablation study to analyze the contributions of different components in our proposed method. Our findings highlight the potential of our method in addressing the unique challenges of failure prediction in industrial settings, thereby contributing to enhanced reliability and efficiency in industrial operations.
- [1423] arXiv:2402.08631 (cross-list from cs.CL) [ pdf , ps , other ]
-
Title: Knowledge Editing on Black-box Large Language ModelsComments: Work in progressSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: Knowledge editing (KE) aims to efficiently and precisely modify the behavior of large language models (LLMs) to update specific knowledge without negatively influencing other knowledge. Current research primarily focuses on white-box LLMs editing, overlooking an important scenario: black-box LLMs editing, where LLMs are accessed through interfaces and only textual output is available. In this paper, we first officially introduce KE on black-box LLMs and then propose a comprehensive evaluation framework to overcome the limitations of existing evaluations that are not applicable to black-box LLMs editing and lack comprehensiveness. To tackle privacy leaks of editing data and style over-editing in current methods, we introduce a novel postEdit framework, resolving privacy concerns through downstream post-processing and maintaining textual style consistency via fine-grained editing to original responses. Experiments and analysis on two benchmarks demonstrate that postEdit outperforms all baselines and achieves strong generalization, especially with huge improvements on style retention (average $+20.82\%\uparrow$).
- [1424] arXiv:2402.08640 (cross-list from cs.DL) [ pdf , ps , html , other ]
-
Title: Forecasting high-impact research topics via machine learning on evolving knowledge graphsComments: 11 pages, 7 figures, Comments welcome!Subjects: Digital Libraries (cs.DL) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: The exponential growth in scientific publications poses a severe challenge for human researchers. It forces attention to more narrow sub-fields, which makes it challenging to discover new impactful research ideas and collaborations outside one's own field. While there are ways to predict a scientific paper's future citation counts, they need the research to be finished and the paper written, usually assessing impact long after the idea was conceived. Here we show how to predict the impact of onsets of ideas that have never been published by researchers. For that, we developed a large evolving knowledge graph built from more than 21 million scientific papers. It combines a semantic network created from the content of the papers and an impact network created from the historic citations of papers. Using machine learning, we can predict the dynamic of the evolving network into the future with high accuracy, and thereby the impact of new research directions. We envision that the ability to predict the impact of new ideas will be a crucial component of future artificial muses that can inspire new impactful and interesting scientific ideas.
- [1425] arXiv:2402.08648 (cross-list from cs.LG) [ pdf , ps , other ]
-
Title: Generating Universal Adversarial Perturbations for Quantum ClassifiersComments: Accepted at AAAI 2024Subjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Abstract: Quantum Machine Learning (QML) has emerged as a promising field of research, aiming to leverage the capabilities of quantum computing to enhance existing machine learning methodologies. Recent studies have revealed that, like their classical counterparts, QML models based on Parametrized Quantum Circuits (PQCs) are also vulnerable to adversarial attacks. Moreover, the existence of Universal Adversarial Perturbations (UAPs) in the quantum domain has been demonstrated theoretically in the context of quantum classifiers. In this work, we introduce QuGAP: a novel framework for generating UAPs for quantum classifiers. We conceptualize the notion of additive UAPs for PQC-based classifiers and theoretically demonstrate their existence. We then utilize generative models (QuGAP-A) to craft additive UAPs and experimentally show that quantum classifiers are susceptible to such attacks. Moreover, we formulate a new method for generating unitary UAPs (QuGAP-U) using quantum generative models and a novel loss function based on fidelity constraints. We evaluate the performance of the proposed framework and show that our method achieves state-of-the-art misclassification rates, while maintaining high fidelity between legitimate and adversarial samples.
- [1426] arXiv:2402.08653 (cross-list from cs.LG) [ pdf , ps , html , other ]
-
Title: SAGMAN: Stability Analysis of Graph Neural Networks on the ManifoldsSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Abstract: Modern graph neural networks (GNNs) can be sensitive to changes in the input graph structure and node features, potentially resulting in unpredictable behavior and degraded performance. In this work, we introduce a spectral framework known as SAGMAN for examining the stability of GNNs. This framework assesses the distance distortions that arise from the nonlinear mappings of GNNs between the input and output manifolds: when two nearby nodes on the input manifold are mapped (through a GNN model) to two distant ones on the output manifold, it implies a large distance distortion and thus a poor GNN stability. We propose a distance-preserving graph dimension reduction (GDR) approach that utilizes spectral graph embedding and probabilistic graphical models (PGMs) to create low-dimensional input/output graph-based manifolds for meaningful stability analysis. Our empirical evaluations show that SAGMAN effectively assesses the stability of each node when subjected to various edge or feature perturbations, offering a scalable approach for evaluating the stability of GNNs, extending to applications within recommendation systems. Furthermore, we illustrate its utility in downstream tasks, notably in enhancing GNN stability and facilitating adversarial targeted attacks.
- [1427] arXiv:2402.08658 (cross-list from cs.HC) [ pdf , ps , other ]
-
Title: The Last JITAI? The Unreasonable Effectiveness of Large Language Models in Issuing Just-in-Time Adaptive Interventions: Fostering Physical Activity in a Prospective Cardiac Rehabilitation SettingDavid Haag , Devender Kumar , Sebastian Gruber , Mahdi Sareban , Gunnar Treff , Josef Niebauer , Christopher Bull , Jan David SmeddinckSubjects: Human-Computer Interaction (cs.HC) ; Artificial Intelligence (cs.AI)
Abstract: We investigated the viability of using Large Language Models (LLMs) for triggering and personalizing content for Just-in-Time Adaptive Interventions (JITAIs) in digital health. JITAIs are being explored as a key mechanism for sustainable behavior change, adapting interventions to an individual's current context and needs. However, traditional rule-based and machine learning models for JITAI implementation face scalability and flexibility limitations, such as lack of personalization, difficulty in managing multi-parametric systems, and issues with data sparsity. To investigate JITAI implementation via LLMs, we tested the contemporary overall performance-leading model 'GPT-4' with examples grounded in the use case of fostering heart-healthy physical activity in outpatient cardiac rehabilitation. Three personas and five sets of context information per persona were used as a basis of triggering and personalizing JITAIs. Subsequently, we generated a total of 450 proposed JITAI decisions and message content, divided equally into JITAIs generated by 10 iterations with GPT-4, a baseline provided by 10 laypersons (LayPs), and a gold standard set by 10 healthcare professionals (HCPs). Ratings from 27 LayPs and 11 HCPs indicated that JITAIs generated by GPT-4 were superior to those by HCPs and LayPs over all assessed scales: i.e., appropriateness, engagement, effectiveness, and professionality. This study indicates that LLMs have significant potential for implementing JITAIs as a building block of personalized or "precision" health, offering scalability, effective personalization based on opportunistically sampled information, and good acceptability.
- [1428] arXiv:2402.08671 (cross-list from cs.CV) [ pdf , ps , html , other ]
-
Title: Are Semi-Dense Detector-Free Methods Good at Matching Local Features?Subjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI)
Abstract: Semi-dense detector-free approaches (SDF), such as LoFTR, are currently among the most popular image matching methods. While SDF methods are trained to establish correspondences between two images, their performances are almost exclusively evaluated using relative pose estimation metrics. Thus, the link between their ability to establish correspondences and the quality of the resulting estimated pose has thus far received little attention. This paper is a first attempt to study this link. We start with proposing a novel structured attention-based image matching architecture (SAM). It allows us to show a counter-intuitive result on two datasets (MegaDepth and HPatches): on the one hand SAM either outperforms or is on par with SDF methods in terms of pose/homography estimation metrics, but on the other hand SDF approaches are significantly better than SAM in terms of matching accuracy. We then propose to limit the computation of the matching accuracy to textured regions, and show that in this case SAM often surpasses SDF methods. Our findings highlight a strong correlation between the ability to establish accurate correspondences in textured regions and the accuracy of the resulting estimated pose/homography. Our code will be made available.
- [1429] arXiv:2402.08672 (cross-list from cs.LG) [ pdf , ps , other ]
-
Title: Model Assessment and Selection under Temporal Distribution ShiftComments: 24 pages, 6 figuresSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Methodology (stat.ME)
Abstract: We investigate model assessment and selection in a changing environment, by synthesizing datasets from both the current time period and historical epochs. To tackle unknown and potentially arbitrary temporal distribution shift, we develop an adaptive rolling window approach to estimate the generalization error of a given model. This strategy also facilitates the comparison between any two candidate models by estimating the difference of their generalization errors. We further integrate pairwise comparisons into a single-elimination tournament, achieving near-optimal model selection from a collection of candidates. Theoretical analyses and numerical experiments demonstrate the adaptivity of our proposed methods to the non-stationarity in data.
- [1430] arXiv:2402.08679 (cross-list from cs.LG) [ pdf , ps , other ]
-
Title: COLD-Attack: Jailbreaking LLMs with Stealthiness and ControllabilitySubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
Abstract: Jailbreaks on Large language models (LLMs) have recently received increasing attention. For a comprehensive assessment of LLM safety, it is essential to consider jailbreaks with diverse attributes, such as contextual coherence and sentiment/stylistic variations, and hence it is beneficial to study controllable jailbreaking, i.e. how to enforce control on LLM attacks. In this paper, we formally formulate the controllable attack generation problem, and build a novel connection between this problem and controllable text generation, a well-explored topic of natural language processing. Based on this connection, we adapt the Energy-based Constrained Decoding with Langevin Dynamics (COLD), a state-of-the-art, highly efficient algorithm in controllable text generation, and introduce the COLD-Attack framework which unifies and automates the search of adversarial LLM attacks under a variety of control requirements such as fluency, stealthiness, sentiment, and left-right-coherence. The controllability enabled by COLD-Attack leads to diverse new jailbreak scenarios which not only cover the standard setting of generating fluent suffix attacks, but also allow us to address new controllable attack settings such as revising a user query adversarially with minimal paraphrasing, and inserting stealthy attacks in context with left-right-coherence. Our extensive experiments on various LLMs (Llama-2, Mistral, Vicuna, Guanaco, GPT-3.5) show COLD-Attack's broad applicability, strong controllability, high success rate, and attack transferability. Our code is available at this https URL .
- [1431] arXiv:2402.08680 (cross-list from cs.LG) [ pdf , ps , other ]
-
Title: Mitigating Object Hallucination in Large Vision-Language Models via Classifier-Free GuidanceComments: 27 pages, 20 figures, 4 tablesSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV)
Abstract: The advancement of Large Vision-Language Models (LVLMs) has increasingly highlighted the critical issue of their tendency to hallucinate non-existing objects in the images. To address this issue, previous works focused on using specially curated datasets or powerful LLMs (e.g., GPT-3.5) to rectify the outputs of LVLMs. However, these approaches require either expensive training/fine-tuning or API access to advanced LLMs to correct the model's output post-generation. In this paper, we tackle this challenge by introducing a framework called Mitigating hallucinAtion via classifieR-Free guIdaNcE (MARINE), which is both training-free and API-free, and can effectively and efficiently reduce object hallucinations during the generation process. Specifically, MARINE enriches the visual context of LVLMs by integrating existing open-source vision models, and employs classifier-free guidance to incorporate the additional object grounding features to improve the precision of LVLMs' generations. Through comprehensive evaluations across $6$ popular LVLMs with diverse evaluation metrics, we demonstrate the effectiveness of MARINE, which even outperforms existing fine-tuning-based methods. Remarkably, it not only reduces hallucinations but also improves the detailedness of LVLMs' generations, as assessed by GPT-4V.
- [1432] arXiv:2402.08682 (cross-list from cs.CV) [ pdf , ps , other ]
-
Title: IM-3D: Iterative Multiview Diffusion and Reconstruction for High-Quality 3D GenerationLuke Melas-Kyriazi , Iro Laina , Christian Rupprecht , Natalia Neverova , Andrea Vedaldi , Oran Gafni , Filippos KokkinosSubjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: Most text-to-3D generators build upon off-the-shelf text-to-image models trained on billions of images. They use variants of Score Distillation Sampling (SDS), which is slow, somewhat unstable, and prone to artifacts. A mitigation is to fine-tune the 2D generator to be multi-view aware, which can help distillation or can be combined with reconstruction networks to output 3D objects directly. In this paper, we further explore the design space of text-to-3D models. We significantly improve multi-view generation by considering video instead of image generators. Combined with a 3D reconstruction algorithm which, by using Gaussian splatting, can optimize a robust image-based loss, we directly produce high-quality 3D outputs from the generated views. Our new method, IM-3D, reduces the number of evaluations of the 2D generator network 10-100x, resulting in a much more efficient pipeline, better quality, fewer geometric inconsistencies, and higher yield of usable 3D assets.
- [1433] arXiv:2402.08690 (cross-list from cs.SI) [ pdf , ps , html , other ]
-
Title: If Turing played piano with an artificial partnerSubjects: Social and Information Networks (cs.SI) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Sound (cs.SD)
Abstract: Music is an inherently social activity that allows people to share experiences and feel connected with one another. There has been little progress in designing artificial partners exhibiting a similar social experience as playing with another person. Neural network architectures that implement generative models, such as large language models, are suited for producing musical scores. Playing music socially, however, involves more than playing a score; it must complement the other musicians' ideas and keep time correctly. We addressed the question of whether a convincing social experience is made possible by a generative model trained to produce musical scores, not necessarily optimized for synchronization and continuation. The network, a variational autoencoder trained on a large corpus of digital scores, was adapted for a timed call-and-response task with a human partner. Participants played piano with a human or artificial partner-in various configurations-and rated the performance quality and first-person experience of self-other integration. Overall, the artificial partners held promise but were rated lower than human partners. The artificial partner with simplest design and highest similarity parameter was not rated differently from the human partners on some measures, suggesting that interactive rather than generative sophistication is important in enabling social AI.
- [1434] arXiv:2402.08702 (cross-list from cs.CL) [ pdf , ps , html , other ]
-
Title: PRompt Optimization in Multi-Step Tasks (PROMST): Integrating Human Feedback and Preference AlignmentComments: 58 pages, 13 figuresSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI); Human-Computer Interaction (cs.HC); Robotics (cs.RO)
Abstract: Prompt optimization aims to find the best prompt to a large language model (LLM) for a given task. LLMs have been successfully used to help find and improve prompt candidates for single-step tasks. However, realistic tasks for agents are multi-step and introduce new challenges: (1) Prompt content is likely to be more extensive and complex, making it more difficult for LLMs to analyze errors, (2) the impact of an individual step is difficult to evaluate, and (3) different people may have varied preferences about task execution. While humans struggle to optimize prompts, they are good at providing feedback about LLM outputs; we therefore introduce a new LLM-driven discrete prompt optimization framework that incorporates human-designed feedback rules to automatically offer direct suggestions for improvement. We also use an extra learned heuristic model that predicts prompt performance to efficiently sample from prompt candidates. This approach significantly outperforms both human-engineered prompts and several other prompt optimization methods across 11 representative multi-step tasks (an average 10.6%-29.3% improvement to current best methods on five LLMs respectively). We further show that the score function for tasks can be modified to better align with individual preferences. We believe our work can serve as a benchmark for automatic prompt optimization for LLM-driven multi-step tasks.
- [1435] arXiv:2402.08703 (cross-list from q-bio.BM) [ pdf , ps , html , other ]
-
Title: A Survey of Generative AI for De Novo Drug Design: New Frontiers in Molecule and Protein GenerationSubjects: Biomolecules (q-bio.BM) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: Artificial intelligence (AI)-driven methods can vastly improve the historically costly drug design process, with various generative models already in widespread use. Generative models for de novo drug design, in particular, focus on the creation of novel biological compounds entirely from scratch, representing a promising future direction. Rapid development in the field, combined with the inherent complexity of the drug design process, creates a difficult landscape for new researchers to enter. In this survey, we organize de novo drug design into two overarching themes: small molecule and protein generation. Within each theme, we identify a variety of subtasks and applications, highlighting important datasets, benchmarks, and model architectures and comparing the performance of top models. We take a broad approach to AI-driven drug design, allowing for both micro-level comparisons of various methods within each subtask and macro-level observations across different fields. We discuss parallel challenges and approaches between the two applications and highlight future directions for AI-driven de novo drug design as a whole. An organized repository of all covered sources is available at this https URL .
- [1436] arXiv:2402.08714 (cross-list from cs.LG) [ pdf , ps , html , other ]
-
Title: PRDP: Proximal Reward Difference Prediction for Large-Scale Reward Finetuning of Diffusion ModelsComments: CVPR 2024. Project page: this https URLSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Abstract: Reward finetuning has emerged as a promising approach to aligning foundation models with downstream objectives. Remarkable success has been achieved in the language domain by using reinforcement learning (RL) to maximize rewards that reflect human preference. However, in the vision domain, existing RL-based reward finetuning methods are limited by their instability in large-scale training, rendering them incapable of generalizing to complex, unseen prompts. In this paper, we propose Proximal Reward Difference Prediction (PRDP), enabling stable black-box reward finetuning for diffusion models for the first time on large-scale prompt datasets with over 100K prompts. Our key innovation is the Reward Difference Prediction (RDP) objective that has the same optimal solution as the RL objective while enjoying better training stability. Specifically, the RDP objective is a supervised regression objective that tasks the diffusion model with predicting the reward difference of generated image pairs from their denoising trajectories. We theoretically prove that the diffusion model that obtains perfect reward difference prediction is exactly the maximizer of the RL objective. We further develop an online algorithm with proximal updates to stably optimize the RDP objective. In experiments, we demonstrate that PRDP can match the reward maximization ability of well-established RL-based methods in small-scale training. Furthermore, through large-scale training on text prompts from the Human Preference Dataset v2 and the Pick-a-Pic v1 dataset, PRDP achieves superior generation quality on a diverse set of complex, unseen prompts whereas RL-based methods completely fail.
- [1437] arXiv:2402.08761 (cross-list from cs.CL) [ pdf , ps , html , other ]
-
Title: JAMDEC: Unsupervised Authorship Obfuscation using Constrained Decoding over Small Language ModelsComments: Code is available at this https URLSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI)
Abstract: The permanence of online content combined with the enhanced authorship identification techniques calls for stronger computational methods to protect the identity and privacy of online authorship when needed, e.g., blind reviews for scientific papers, anonymous online reviews, or anonymous interactions in the mental health forums. In this paper, we propose an unsupervised inference-time approach to authorship obfuscation to address the unique challenges of authorship obfuscation: lack of supervision data for diverse authorship and domains, and the need for a sufficient level of revision beyond simple paraphrasing to obfuscate the authorship, all the while preserving the original content and fluency.
We introduce JAMDEC, a user-controlled, inference-time algorithm for authorship obfuscation that can be in principle applied to any text and authorship. Our approach builds on small language models such as GPT2-XL in order to help avoid disclosing the original content to proprietary LLM's APIs, while also reducing the performance gap between small and large language models via algorithmic enhancement. The key idea behind our approach is to boost the creative power of smaller language models through constrained decoding, while also allowing for user-specified controls and flexibility. Experimental results demonstrate that our approach based on GPT2-XL outperforms previous state-of-the-art methods based on comparably small models, while performing competitively against GPT3.5 175B, a propriety model that is two orders of magnitudes larger. - [1438] arXiv:2402.08777 (cross-list from q-bio.GN) [ pdf , ps , other ]
-
Title: DNABERT-S: Learning Species-Aware DNA Embedding with Genome Foundation ModelsZhihan Zhou , Weimin Wu , Harrison Ho , Jiayi Wang , Lizhen Shi , Ramana V Davuluri , Zhong Wang , Han LiuSubjects: Genomics (q-bio.GN) ; Artificial Intelligence (cs.AI); Computational Engineering, Finance, and Science (cs.CE); Computation and Language (cs.CL)
Abstract: Effective DNA embedding remains crucial in genomic analysis, particularly in scenarios lacking labeled data for model fine-tuning, despite the significant advancements in genome foundation models. A prime example is metagenomics binning, a critical process in microbiome research that aims to group DNA sequences by their species from a complex mixture of DNA sequences derived from potentially thousands of distinct, often uncharacterized species. To fill the lack of effective DNA embedding models, we introduce DNABERT-S, a genome foundation model that specializes in creating species-aware DNA embeddings. To encourage effective embeddings to error-prone long-read DNA sequences, we introduce Manifold Instance Mixup (MI-Mix), a contrastive objective that mixes the hidden representations of DNA sequences at randomly selected layers and trains the model to recognize and differentiate these mixed proportions at the output layer. We further enhance it with the proposed Curriculum Contrastive Learning (C$^2$LR) strategy. Empirical results on 18 diverse datasets showed DNABERT-S's remarkable performance. It outperforms the top baseline's performance in 10-shot species classification with just a 2-shot training while doubling the Adjusted Rand Index (ARI) in species clustering and substantially increasing the number of correctly identified species in metagenomics binning. The code, data, and pre-trained model are publicly available at this https URL .
- [1439] arXiv:2402.08789 (cross-list from eess.AS) [ pdf , ps , html , other ]
-
Title: Leveraging cough sounds to optimize chest x-ray usage in low-resource settingsAlexander Philip , Sanya Chawla , Lola Jover , George P. Kafentzis , Joe Brew , Vishakh Saraf , Shibu Vijayan , Peter Small , Carlos ChaccourSubjects: Audio and Speech Processing (eess.AS) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Quantitative Methods (q-bio.QM)
Abstract: Chest X-ray is a commonly used tool during triage, diagnosis and management of respiratory diseases. In resource-constricted settings, optimizing this resource can lead to valuable cost savings for the health care system and the patients as well as to and improvement in consult time. We used prospectively-collected data from 137 patients referred for chest X-ray at the Christian Medical Center and Hospital (CMCH) in Purnia, Bihar, India. Each patient provided at least five coughs while awaiting radiography. Collected cough sounds were analyzed using acoustic AI methods. Cross-validation was done on temporal and spectral features on the cough sounds of each patient. Features were summarized using standard statistical approaches. Three models were developed, tested and compared in their capacity to predict an abnormal result in the chest X-ray. All three methods yielded models that could discriminate to some extent between normal and abnormal with the logistic regression performing best with an area under the receiver operating characteristic curves ranging from 0.7 to 0.78. Despite limitations and its relatively small sample size, this study shows that AI-enabled algorithms can use cough sounds to predict which individuals presenting for chest radiographic examination will have a normal or abnormal results. These results call for expanding this research given the potential optimization of limited health care resources in low- and middle-income countries.
- [1440] arXiv:2402.08801 (cross-list from cs.SE) [ pdf , ps , html , other ]
-
Title: ChatGPT vs LLaMA: Impact, Reliability, and Challenges in Stack Overflow DiscussionsComments: 36 pages, 9 figuresSubjects: Software Engineering (cs.SE) ; Artificial Intelligence (cs.AI)
Abstract: Since its release in November 2022, ChatGPT has shaken up Stack Overflow, the premier platform for developers' queries on programming and software development. Demonstrating an ability to generate instant, human-like responses to technical questions, ChatGPT has ignited debates within the developer community about the evolving role of human-driven platforms in the age of generative AI. Two months after ChatGPT's release, Meta released its answer with its own Large Language Model (LLM) called LLaMA: the race was on. We conducted an empirical study analyzing questions from Stack Overflow and using these LLMs to address them. This way, we aim to (ii) measure user engagement evolution with Stack Overflow over time; (ii) quantify the reliability of LLMs' answers and their potential to replace Stack Overflow in the long term; (iii) identify and understand why LLMs fails; and (iv) compare LLMs together. Our empirical results are unequivocal: ChatGPT and LLaMA challenge human expertise, yet do not outperform it for some domains, while a significant decline in user posting activity has been observed. Furthermore, we also discuss the impact of our findings regarding the usage and development of new LLMs.
- [1441] arXiv:2402.08812 (cross-list from cs.HC) [ pdf , ps , html , other ]
-
Title: Intelligent Canvas: Enabling Design-Like Exploratory Visual Data Analysis with Generative AI through Rapid Prototyping, Iteration and CurationSubjects: Human-Computer Interaction (cs.HC) ; Artificial Intelligence (cs.AI)
Abstract: Complex data analysis inherently seeks unexpected insights through exploratory visual analysis methods, transcending logical, step-by-step processing. However, existing interfaces such as notebooks and dashboards have limitations in exploration and comparison for visual data analysis. Addressing these limitations, we introduce a "design-like" intelligent canvas environment integrating generative AI into data analysis, offering rapid prototyping, iteration, and comparative visualization management. Our dual contributions include the integration of generative AI components into a canvas interface, and empirical findings from a user study (N=10) evaluating the effectiveness of the canvas interface.
- [1442] arXiv:2402.08831 (cross-list from cs.CL) [ pdf , ps , html , other ]
-
Title: eCeLLM: Generalizing Large Language Models for E-commerce from Large-scale, High-quality Instruction DataComments: Bo Peng and Xinyi Ling contributed equally to this paperSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI); Information Retrieval (cs.IR)
Abstract: With tremendous efforts on developing effective e-commerce models, conventional e-commerce models show limited success in generalist e-commerce modeling, and suffer from unsatisfactory performance on new users and new products - a typical out-of-domain generalization challenge. Meanwhile, large language models (LLMs) demonstrate outstanding performance in generalist modeling and out-of-domain generalizability in many fields. Toward fully unleashing their power for e-commerce, in this paper, we construct ECInstruct, the first open-sourced, large-scale, and high-quality benchmark instruction dataset for e-commerce. Leveraging ECInstruct, we develop eCeLLM, a series of e-commerce LLMs, by instruction-tuning general-purpose LLMs. Our comprehensive experiments and evaluation demonstrate that eCeLLM models substantially outperform baseline models, including the most advanced GPT-4, and the state-of-the-art task-specific models in in-domain evaluation. Moreover, eCeLLM exhibits excellent generalizability to out-of-domain settings, including unseen products and unseen instructions, highlighting its superiority as a generalist e-commerce model. Both the ECInstruct dataset and the eCeLLM models show great potential in empowering versatile and effective LLMs for e-commerce. ECInstruct and eCeLLM models are publicly accessible through this https URL .
- [1443] arXiv:2402.08832 (cross-list from cs.LG) [ pdf , ps , html , other ]
-
Title: Intelligent Agricultural Management Considering N$_2$O Emission and Climate Variability with UncertaintiesSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Computers and Society (cs.CY)
Abstract: This study examines how artificial intelligence (AI), especially Reinforcement Learning (RL), can be used in farming to boost crop yields, fine-tune nitrogen use and watering, and reduce nitrate runoff and greenhouse gases, focusing on Nitrous Oxide (N$_2$O) emissions from soil. Facing climate change and limited agricultural knowledge, we use Partially Observable Markov Decision Processes (POMDPs) with a crop simulator to model AI agents' interactions with farming environments. We apply deep Q-learning with Recurrent Neural Network (RNN)-based Q networks for training agents on optimal actions. Also, we develop Machine Learning (ML) models to predict N$_2$O emissions, integrating these predictions into the simulator. Our research tackles uncertainties in N$_2$O emission estimates with a probabilistic ML approach and climate variability through a stochastic weather model, offering a range of emission outcomes to improve forecast reliability and decision-making. By incorporating climate change effects, we enhance agents' climate adaptability, aiming for resilient agricultural practices. Results show these agents can align crop productivity with environmental concerns by penalizing N$_2$O emissions, adapting effectively to climate shifts like warmer temperatures and less rain. This strategy improves farm management under climate change, highlighting AI's role in sustainable agriculture.
- [1444] arXiv:2402.08846 (cross-list from cs.CL) [ pdf , ps , html , other ]
-
Title: An Embarrassingly Simple Approach for LLM with Strong ASR CapacityZiyang Ma , Guanrou Yang , Yifan Yang , Zhifu Gao , Jiaming Wang , Zhihao Du , Fan Yu , Qian Chen , Siqi Zheng , Shiliang Zhang , Xie ChenComments: Working in progress and will open-source soonSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI); Multimedia (cs.MM); Sound (cs.SD); Audio and Speech Processing (eess.AS)
Abstract: In this paper, we focus on solving one of the most important tasks in the field of speech processing, i.e., automatic speech recognition (ASR), with speech foundation encoders and large language models (LLM). Recent works have complex designs such as compressing the output temporally for the speech encoder, tackling modal alignment for the projector, and utilizing parameter-efficient fine-tuning for the LLM. We found that delicate designs are not necessary, while an embarrassingly simple composition of off-the-shelf speech encoder, LLM, and the only trainable linear projector is competent for the ASR task. To be more specific, we benchmark and explore various combinations of LLMs and speech encoders, leading to the optimal LLM-based ASR system, which we call SLAM-ASR. The proposed SLAM-ASR provides a clean setup and little task-specific design, where only the linear projector is trained. To the best of our knowledge, SLAM-ASR achieves the best performance on the Librispeech benchmark among LLM-based ASR models and even outperforms the latest LLM-based audio-universal model trained on massive pair data. Finally, we explore the capability emergence of LLM-based ASR in the process of modal alignment. We hope that our study can facilitate the research on extending LLM with cross-modality capacity and shed light on the LLM-based ASR community.
- [1445] arXiv:2402.08848 (cross-list from cs.LG) [ pdf , ps , html , other ]
-
Title: Hybrid Inverse Reinforcement LearningSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Abstract: The inverse reinforcement learning approach to imitation learning is a double-edged sword. On the one hand, it can enable learning from a smaller number of expert demonstrations with more robustness to error compounding than behavioral cloning approaches. On the other hand, it requires that the learner repeatedly solve a computationally expensive reinforcement learning (RL) problem. Often, much of this computation is wasted searching over policies very dissimilar to the expert's. In this work, we propose using hybrid RL -- training on a mixture of online and expert data -- to curtail unnecessary exploration. Intuitively, the expert data focuses the learner on good states during training, which reduces the amount of exploration required to compute a strong policy. Notably, such an approach doesn't need the ability to reset the learner to arbitrary states in the environment, a requirement of prior work in efficient inverse RL. More formally, we derive a reduction from inverse RL to expert-competitive RL (rather than globally optimal RL) that allows us to dramatically reduce interaction during the inner policy search loop while maintaining the benefits of the IRL approach. This allows us to derive both model-free and model-based hybrid inverse RL algorithms with strong policy performance guarantees. Empirically, we find that our approaches are significantly more sample efficient than standard inverse RL and several other baselines on a suite of continuous control tasks.
- [1446] arXiv:2402.08855 (cross-list from cs.HC) [ pdf , ps , other ]
-
Title: GhostWriter: Augmenting Collaborative Human-AI Writing Experiences Through Personalization and AgencyComments: 29 pages, 12 figuresSubjects: Human-Computer Interaction (cs.HC) ; Artificial Intelligence (cs.AI)
Abstract: Large language models (LLMs) are becoming more prevalent and have found a ubiquitous use in providing different forms of writing assistance. However, LLM-powered writing systems can frustrate users due to their limited personalization and control, which can be exacerbated when users lack experience with prompt engineering. We see design as one way to address these challenges and introduce GhostWriter, an AI-enhanced writing design probe where users can exercise enhanced agency and personalization. GhostWriter leverages LLMs to learn the user's intended writing style implicitly as they write, while allowing explicit teaching moments through manual style edits and annotations. We study 18 participants who use GhostWriter on two different writing tasks, observing that it helps users craft personalized text generations and empowers them by providing multiple ways to control the system's writing style. From this study, we present insights regarding people's relationship with AI-assisted writing and offer design recommendations for future work.
- [1447] arXiv:2402.08876 (cross-list from cs.CV) [ pdf , ps , html , other ]
-
Title: DUDF: Differentiable Unsigned Distance Fields with Hyperbolic ScalingSubjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI); Graphics (cs.GR)
Abstract: In recent years, there has been a growing interest in training Neural Networks to approximate Unsigned Distance Fields (UDFs) for representing open surfaces in the context of 3D reconstruction. However, UDFs are non-differentiable at the zero level set which leads to significant errors in distances and gradients, generally resulting in fragmented and discontinuous surfaces. In this paper, we propose to learn a hyperbolic scaling of the unsigned distance field, which defines a new Eikonal problem with distinct boundary conditions. This allows our formulation to integrate seamlessly with state-of-the-art continuously differentiable implicit neural representation networks, largely applied in the literature to represent signed distance fields. Our approach not only addresses the challenge of open surface representation but also demonstrates significant improvement in reconstruction quality and training performance. Moreover, the unlocked field's differentiability allows the accurate computation of essential topological properties such as normal directions and curvatures, pervasive in downstream tasks such as rendering. Through extensive experiments, we validate our approach across various data sets and against competitive baselines. The results demonstrate enhanced accuracy and up to an order of magnitude increase in speed compared to previous methods.
- [1448] arXiv:2402.08907 (cross-list from cs.LG) [ pdf , ps , html , other ]
-
Title: Subgraph Pooling: Tackling Negative Transfer on GraphsComments: Accepted by IJCAI 24Subjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Social and Information Networks (cs.SI)
Abstract: Transfer learning aims to enhance performance on a target task by using knowledge from related tasks. However, when the source and target tasks are not closely aligned, it can lead to reduced performance, known as negative transfer. Unlike in image or text data, we find that negative transfer could commonly occur in graph-structured data, even when source and target graphs have semantic similarities. Specifically, we identify that structural differences significantly amplify the dissimilarities in the node embeddings across graphs. To mitigate this, we bring a new insight in this paper: for semantically similar graphs, although structural differences lead to significant distribution shift in node embeddings, their impact on subgraph embeddings could be marginal. Building on this insight, we introduce Subgraph Pooling (SP) by aggregating nodes sampled from a k-hop neighborhood and Subgraph Pooling++ (SP++) by a random walk, to mitigate the impact of graph structural differences on knowledge transfer. We theoretically analyze the role of SP in reducing graph discrepancy and conduct extensive experiments to evaluate its superiority under various settings. The proposed SP methods are effective yet elegant, which can be easily applied on top of any backbone Graph Neural Networks (GNNs). Our code and data are available at: this https URL .
- [1449] arXiv:2402.08918 (cross-list from cs.LG) [ pdf , ps , html , other ]
-
Title: Graph Inference Acceleration by Learning MLPs on Graphs without SupervisionSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Social and Information Networks (cs.SI)
Abstract: Graph Neural Networks (GNNs) have demonstrated effectiveness in various graph learning tasks, yet their reliance on message-passing constraints their deployment in latency-sensitive applications such as financial fraud detection. Recent works have explored distilling knowledge from GNNs to Multi-Layer Perceptrons (MLPs) to accelerate inference. However, this task-specific supervised distillation limits generalization to unseen nodes, which are prevalent in latency-sensitive applications. To this end, we present \textbf{\textsc{SimMLP}}, a \textbf{\textsc{Sim}}ple yet effective framework for learning \textbf{\textsc{MLP}}s on graphs without supervision, to enhance generalization. \textsc{SimMLP} employs self-supervised alignment between GNNs and MLPs to capture the fine-grained and generalizable correlation between node features and graph structures, and proposes two strategies to alleviate the risk of trivial solutions. Theoretically, we comprehensively analyze \textsc{SimMLP} to demonstrate its equivalence to GNNs in the optimal case and its generalization capability. Empirically, \textsc{SimMLP} outperforms state-of-the-art baselines, especially in settings with unseen nodes. In particular, it obtains significant performance gains {\bf (7$\sim$26\%)} over MLPs and inference acceleration over GNNs {\bf (90$\sim$126$\times$)} on large-scale graph datasets. Our codes are available at: \url{ this https URL }.
- [1450] arXiv:2402.08921 (cross-list from cs.IR) [ pdf , ps , html , other ]
-
Title: Enhancing ID and Text Fusion via Alternative Training in Session-based RecommendationSubjects: Information Retrieval (cs.IR) ; Artificial Intelligence (cs.AI)
Abstract: Session-based recommendation has gained increasing attention in recent years, with its aim to offer tailored suggestions based on users' historical behaviors within sessions.
To advance this field, a variety of methods have been developed, with ID-based approaches typically demonstrating promising performance. However, these methods often face challenges with long-tail items and overlook other rich forms of information, notably valuable textual semantic information. To integrate text information, various methods have been introduced, mostly following a naive fusion framework. Surprisingly, we observe that fusing these two modalities does not consistently outperform the best single modality by following the naive fusion framework. Further investigation reveals an potential imbalance issue in naive fusion, where the ID dominates and text modality is undertrained. This suggests that the unexpected observation may stem from naive fusion's failure to effectively balance the two modalities, often over-relying on the stronger ID modality. This insight suggests that naive fusion might not be as effective in combining ID and text as previously expected. To address this, we propose a novel alternative training strategy AlterRec. It separates the training of ID and text, thereby avoiding the imbalance issue seen in naive fusion. Additionally, AlterRec designs a novel strategy to facilitate the interaction between the two modalities, enabling them to mutually learn from each other and integrate the text more effectively. Comprehensive experiments demonstrate the effectiveness of AlterRec in session-based recommendation. The implementation is available at this https URL . - [1451] arXiv:2402.08925 (cross-list from cs.CL) [ pdf , ps , html , other ]
-
Title: MaxMin-RLHF: Towards Equitable Alignment of Large Language Models with Diverse Human PreferencesSouradip Chakraborty , Jiahao Qiu , Hui Yuan , Alec Koppel , Furong Huang , Dinesh Manocha , Amrit Singh Bedi , Mengdi WangSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Robotics (cs.RO)
Abstract: Reinforcement Learning from Human Feedback (RLHF) aligns language models to human preferences by employing a singular reward model derived from preference data. However, such an approach overlooks the rich diversity of human preferences inherent in data collected from multiple users. In this work, we first derive an impossibility result of alignment with single reward RLHF, thereby highlighting its insufficiency in representing diverse human preferences. To provide an equitable solution to the problem, we learn a mixture of preference distributions via an expectation-maximization algorithm and propose a MaxMin alignment objective for policy learning inspired by the Egalitarian principle in social choice theory to better represent diverse human preferences. We elucidate the connection of our proposed approach to distributionally robust optimization and general utility RL, thereby highlighting the generality and robustness of our proposed solution. We present comprehensive experimental results on small-scale (GPT-2) and large-scale language models (with Tulu2-7B) and show the efficacy of the proposed approach in the presence of diversity among human preferences. Our algorithm achieves an average improvement of more than 16% in win-rates over conventional RLHF algorithms and improves the win-rate (accuracy) for minority groups by over 33% without compromising the performance of majority groups, showcasing the robustness and fairness of our approach. We remark that our findings in this work are not only limited to language models but also extend to reinforcement learning in general.
- [1452] arXiv:2402.08958 (cross-list from cs.LG) [ pdf , ps , html , other ]
-
Title: Towards Next-Level Post-Training Quantization of Hyper-Scale TransformersComments: 17 pages, under reviewSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Abstract: With the increasing complexity of generative AI models, post-training quantization (PTQ) has emerged as a promising solution for deploying hyper-scale models on edge devices such as mobile devices and TVs. Existing PTQ schemes, however, consume considerable time and resources, which could be a bottleneck in real situations where frequent model updates and multiple hyper-parameter tunings are required. As a cost-effective alternative, one-shot PTQ schemes have been proposed. Still, the performance is somewhat limited because they cannot consider the inter-layer dependency within the attention module, which is a very important feature of Transformers. In this paper, we thus propose a novel PTQ algorithm that balances accuracy and efficiency. The key idea of the proposed algorithm called aespa is to perform quantization layer-wise for efficiency while considering cross-layer dependency to preserve the attention score. Through extensive experiments on various language models and complexity analysis, we demonstrate that aespa is accurate and efficient in quantizing Transformer models.
- [1453] arXiv:2402.08960 (cross-list from cs.CV) [ pdf , ps , html , other ]
-
Title: Open-Vocabulary Segmentation with Unpaired Mask-Text SupervisionComments: 23 pages, 17 figures, 5 tablesSubjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI)
Abstract: Contemporary cutting-edge open-vocabulary segmentation approaches commonly rely on image-mask-text triplets, yet this restricted annotation is labour-intensive and encounters scalability hurdles in complex real-world scenarios. Although some methods are proposed to reduce the annotation cost with only text supervision, the incompleteness of supervision severely limits the versatility and performance. In this paper, we liberate the strict correspondence between masks and texts by using independent image-mask and image-text pairs, which can be easily collected respectively. With this unpaired mask-text supervision, we propose a new weakly-supervised open-vocabulary segmentation framework (Uni-OVSeg) that leverages confident pairs of mask predictions and entities in text descriptions. Using the independent image-mask and image-text pairs, we predict a set of binary masks and associate them with entities by resorting to the CLIP embedding space. However, the inherent noise in the correspondence between masks and entities poses a significant challenge when obtaining reliable pairs. In light of this, we advocate using the large vision-language model (LVLM) to refine text descriptions and devise a multi-scale ensemble to stablise the matching between masks and entities. Compared to text-only weakly-supervised methods, our Uni-OVSeg achieves substantial improvements of 15.5% mIoU on the ADE20K datasets, and even surpasses fully-supervised methods on the challenging PASCAL Context-459 dataset.
- [1454] arXiv:2402.08963 (cross-list from cs.LG) [ pdf , ps , html , other ]
-
Title: DUEL: Duplicate Elimination on Active Memory for Self-Supervised Class-Imbalanced LearningComments: Accepted as a full paper at AAAI 2024: The 38th Annual AAAI Conference on Artificial Intelligence (Main Tech Track). 7 pages (main paper), 2 pages (references), 11 pages (appendix) eachSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Abstract: Recent machine learning algorithms have been developed using well-curated datasets, which often require substantial cost and resources. On the other hand, the direct use of raw data often leads to overfitting towards frequently occurring class information. To address class imbalances cost-efficiently, we propose an active data filtering process during self-supervised pre-training in our novel framework, Duplicate Elimination (DUEL). This framework integrates an active memory inspired by human working memory and introduces distinctiveness information, which measures the diversity of the data in the memory, to optimize both the feature extractor and the memory. The DUEL policy, which replaces the most duplicated data with new samples, aims to enhance the distinctiveness information in the memory and thereby mitigate class imbalances. We validate the effectiveness of the DUEL framework in class-imbalanced environments, demonstrating its robustness and providing reliable results in downstream tasks. We also analyze the role of the DUEL policy in the training process through various metrics and visualizations.
- [1455] arXiv:2402.08975 (cross-list from cs.LG) [ pdf , ps , other ]
-
Title: Research and application of Transformer based anomaly detection model: A literature reviewComments: 77 pages, 11 figuresSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Abstract: Transformer, as one of the most advanced neural network models in Natural Language Processing (NLP), exhibits diverse applications in the field of anomaly detection. To inspire research on Transformer-based anomaly detection, this review offers a fresh perspective on the concept of anomaly detection. We explore the current challenges of anomaly detection and provide detailed insights into the operating principles of Transformer and its variants in anomaly detection tasks. Additionally, we delineate various application scenarios for Transformer-based anomaly detection models and discuss the datasets and evaluation metrics employed. Furthermore, this review highlights the key challenges in Transformer-based anomaly detection research and conducts a comprehensive analysis of future research trends in this domain. The review includes an extensive compilation of over 100 core references related to Transformer-based anomaly detection. To the best of our knowledge, this is the first comprehensive review that focuses on the research related to Transformer in the context of anomaly detection. We hope that this paper can provide detailed technical information to researchers interested in Transformer-based anomaly detection tasks.
- [1456] arXiv:2402.08979 (cross-list from eess.SY) [ pdf , ps , other ]
-
Title: Learning-enabled Flexible Job-shop Scheduling for Scalable Smart ManufacturingSubjects: Systems and Control (eess.SY) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: In smart manufacturing systems (SMSs), flexible job-shop scheduling with transportation constraints (FJSPT) is essential to optimize solutions for maximizing productivity, considering production flexibility based on automated guided vehicles (AGVs). Recent developments in deep reinforcement learning (DRL)-based methods for FJSPT have encountered a scale generalization challenge. These methods underperform when applied to environment at scales different from their training set, resulting in low-quality solutions. To address this, we introduce a novel graph-based DRL method, named the Heterogeneous Graph Scheduler (HGS). Our method leverages locally extracted relational knowledge among operations, machines, and vehicle nodes for scheduling, with a graph-structured decision-making framework that reduces encoding complexity and enhances scale generalization. Our performance evaluation, conducted with benchmark datasets, reveals that the proposed method outperforms traditional dispatching rules, meta-heuristics, and existing DRL-based approaches in terms of makespan performance, even on large-scale instances that have not been experienced during training.
- [1457] arXiv:2402.08982 (cross-list from cs.LG) [ pdf , ps , html , other ]
-
Title: MEL: Efficient Multi-Task Evolutionary Learning for High-Dimensional Feature SelectionSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Neural and Evolutionary Computing (cs.NE)
Abstract: Feature selection is a crucial step in data mining to enhance model performance by reducing data dimensionality. However, the increasing dimensionality of collected data exacerbates the challenge known as the "curse of dimensionality", where computation grows exponentially with the number of dimensions. To tackle this issue, evolutionary computational (EC) approaches have gained popularity due to their simplicity and applicability. Unfortunately, the diverse designs of EC methods result in varying abilities to handle different data, often underutilizing and not sharing information effectively. In this paper, we propose a novel approach called PSO-based Multi-task Evolutionary Learning (MEL) that leverages multi-task learning to address these challenges. By incorporating information sharing between different feature selection tasks, MEL achieves enhanced learning ability and efficiency. We evaluate the effectiveness of MEL through extensive experiments on 22 high-dimensional datasets. Comparing against 24 EC approaches, our method exhibits strong competitiveness. Additionally, we have open-sourced our code on GitHub at this https URL .
- [1458] arXiv:2402.08983 (cross-list from cs.CR) [ pdf , ps , html , other ]
-
Title: SafeDecoding: Defending against Jailbreak Attacks via Safety-Aware DecodingSubjects: Cryptography and Security (cs.CR) ; Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
Abstract: As large language models (LLMs) become increasingly integrated into real-world applications such as code generation and chatbot assistance, extensive efforts have been made to align LLM behavior with human values, including safety. Jailbreak attacks, aiming to provoke unintended and unsafe behaviors from LLMs, remain a significant/leading LLM safety threat. In this paper, we aim to defend LLMs against jailbreak attacks by introducing SafeDecoding, a safety-aware decoding strategy for LLMs to generate helpful and harmless responses to user queries. Our insight in developing SafeDecoding is based on the observation that, even though probabilities of tokens representing harmful contents outweigh those representing harmless responses, safety disclaimers still appear among the top tokens after sorting tokens by probability in descending order. This allows us to mitigate jailbreak attacks by identifying safety disclaimers and amplifying their token probabilities, while simultaneously attenuating the probabilities of token sequences that are aligned with the objectives of jailbreak attacks. We perform extensive experiments on five LLMs using six state-of-the-art jailbreak attacks and four benchmark datasets. Our results show that SafeDecoding significantly reduces the attack success rate and harmfulness of jailbreak attacks without compromising the helpfulness of responses to benign user queries. SafeDecoding outperforms six defense methods.
- [1459] arXiv:2402.08994 (cross-list from cs.CV) [ pdf , ps , other ]
-
Title: CLIP-MUSED: CLIP-Guided Multi-Subject Visual Neural Information Semantic DecodingComments: Accepted by ICLR2024Subjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI)
Abstract: The study of decoding visual neural information faces challenges in generalizing single-subject decoding models to multiple subjects, due to individual differences. Moreover, the limited availability of data from a single subject has a constraining impact on model performance. Although prior multi-subject decoding methods have made significant progress, they still suffer from several limitations, including difficulty in extracting global neural response features, linear scaling of model parameters with the number of subjects, and inadequate characterization of the relationship between neural responses of different subjects to various stimuli. To overcome these limitations, we propose a CLIP-guided Multi-sUbject visual neural information SEmantic Decoding (CLIP-MUSED) method. Our method consists of a Transformer-based feature extractor to effectively model global neural representations. It also incorporates learnable subject-specific tokens that facilitates the aggregation of multi-subject data without a linear increase of parameters. Additionally, we employ representational similarity analysis (RSA) to guide token representation learning based on the topological relationship of visual stimuli in the representation space of CLIP, enabling full characterization of the relationship between neural responses of different subjects under different stimuli. Finally, token representations are used for multi-subject semantic decoding. Our proposed method outperforms single-subject decoding methods and achieves state-of-the-art performance among the existing multi-subject methods on two fMRI datasets. Visualization results provide insights into the effectiveness of our proposed method. Code is available at this https URL .
- [1460] arXiv:2402.08995 (cross-list from cs.HC) [ pdf , ps , html , other ]
-
Title: AgentLens: Visual Analysis for Agent Behaviors in LLM-based Autonomous SystemsSubjects: Human-Computer Interaction (cs.HC) ; Artificial Intelligence (cs.AI)
Abstract: Recently, Large Language Model based Autonomous system(LLMAS) has gained great popularity for its potential to simulate complicated behaviors of human societies. One of its main challenges is to present and analyze the dynamic events evolution of LLMAS. In this work, we present a visualization approach to explore detailed statuses and agents' behavior within LLMAS. We propose a general pipeline that establishes a behavior structure from raw LLMAS execution events, leverages a behavior summarization algorithm to construct a hierarchical summary of the entire structure in terms of time sequence, and a cause trace method to mine the causal relationship between agent behaviors. We then develop AgentLens, a visual analysis system that leverages a hierarchical temporal visualization for illustrating the evolution of LLMAS, and supports users to interactively investigate details and causes of agents' behaviors. Two usage scenarios and a user study demonstrate the effectiveness and usability of our AgentLens.
- [1461] arXiv:2402.09015 (cross-list from cs.CL) [ pdf , ps , html , other ]
-
Title: Towards better Human-Agent Alignment: Assessing Task Utility in LLM-Powered ApplicationsNegar Arabzadeh , Julia Kiseleva , Qingyun Wu , Chi Wang , Ahmed Awadallah , Victor Dibia , Adam Fourney , Charles ClarkeSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI)
Abstract: The rapid development in the field of Large Language Models (LLMs) has led to a surge in applications that facilitate collaboration among multiple agents to assist humans in their daily tasks. However, a significant gap remains in assessing whether LLM-powered applications genuinely enhance user experience and task execution efficiency. This highlights the pressing need for methods to verify utility of LLM-powered applications, particularly by ensuring alignment between the application's functionality and end-user needs. We introduce AgentEval provides an implementation for the math problems, a novel framework designed to simplify the utility verification process by automatically proposing a set of criteria tailored to the unique purpose of any given application. This allows for a comprehensive assessment, quantifying the utility of an application against the suggested criteria. We present a comprehensive analysis of the robustness of quantifier's work.
- [1462] arXiv:2402.09023 (cross-list from cs.CR) [ pdf , ps , other ]
-
Title: Review-Incorporated Model-Agnostic Profile Injection Attacks on Recommender SystemsComments: Accepted by ICDM 2023Subjects: Cryptography and Security (cs.CR) ; Artificial Intelligence (cs.AI)
Abstract: Recent studies have shown that recommender systems (RSs) are highly vulnerable to data poisoning attacks. Understanding attack tactics helps improve the robustness of RSs. We intend to develop efficient attack methods that use limited resources to generate high-quality fake user profiles to achieve 1) transferability among black-box RSs 2) and imperceptibility among detectors. In order to achieve these goals, we introduce textual reviews of products to enhance the generation quality of the profiles. Specifically, we propose a novel attack framework named R-Trojan, which formulates the attack objectives as an optimization problem and adopts a tailored transformer-based generative adversarial network (GAN) to solve it so that high-quality attack profiles can be produced. Comprehensive experiments on real-world datasets demonstrate that R-Trojan greatly outperforms state-of-the-art attack methods on various victim RSs under black-box settings and show its good imperceptibility.
- [1463] arXiv:2402.09034 (cross-list from cs.LG) [ pdf , ps , other ]
-
Title: Enhancing Sequential Model Performance with Squared Sigmoid TanH (SST) Activation Under Data ConstraintsComments: 10 pages,9 figures, Submitted to IJCAI 2024 conferenceSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Abstract: Activation functions enable neural networks to learn complex representations by introducing non-linearities. While feedforward models commonly use rectified linear units, sequential models like recurrent neural networks, long short-term memory (LSTMs) and gated recurrent units (GRUs) still rely on Sigmoid and TanH activation functions. However, these classical activation functions often struggle to model sparse patterns when trained on small sequential datasets to effectively capture temporal dependencies. To address this limitation, we propose squared Sigmoid TanH (SST) activation specifically tailored to enhance the learning capability of sequential models under data constraints. SST applies mathematical squaring to amplify differences between strong and weak activations as signals propagate over time, facilitating improved gradient flow and information filtering. We evaluate SST-powered LSTMs and GRUs for diverse applications, such as sign language recognition, regression, and time-series classification tasks, where the dataset is limited. Our experiments demonstrate that SST models consistently outperform RNN-based models with baseline activations, exhibiting improved test accuracy.
- [1464] arXiv:2402.09055 (cross-list from cs.CV) [ pdf , ps , html , other ]
-
Title: Comment-aided Video-Language Alignment via Contrastive Pre-training for Short-form Video Humor DetectionComments: Accepted by ICMR 2024Subjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI)
Abstract: The growing importance of multi-modal humor detection within affective computing correlates with the expanding influence of short-form video sharing on social media platforms. In this paper, we propose a novel two-branch hierarchical model for short-form video humor detection (SVHD), named Comment-aided Video-Language Alignment (CVLA) via data-augmented multi-modal contrastive pre-training. Notably, our CVLA not only operates on raw signals across various modal channels but also yields an appropriate multi-modal representation by aligning the video and language components within a consistent semantic space. The experimental results on two humor detection datasets, including DY11k and UR-FUNNY, demonstrate that CVLA dramatically outperforms state-of-the-art and several competitive baseline approaches. Our dataset, code and model release at this https URL .
- [1465] arXiv:2402.09059 (cross-list from cs.LG) [ pdf , ps , other ]
-
Title: I can't see it but I can Fine-tune it: On Encrypted Fine-tuning of Transformers using Fully Homomorphic EncryptionComments: Accepted for the presentation at PPAI @The 38th Annual AAAI Conference on Artificial Intelligence 2024Subjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Cryptography and Security (cs.CR)
Abstract: In today's machine learning landscape, fine-tuning pretrained transformer models has emerged as an essential technique, particularly in scenarios where access to task-aligned training data is limited. However, challenges surface when data sharing encounters obstacles due to stringent privacy regulations or user apprehension regarding personal information disclosure. Earlier works based on secure multiparty computation (SMC) and fully homomorphic encryption (FHE) for privacy-preserving machine learning (PPML) focused more on privacy-preserving inference than privacy-preserving training. In response, we introduce BlindTuner, a privacy-preserving fine-tuning system that enables transformer training exclusively on homomorphically encrypted data for image classification. Our extensive experimentation validates BlindTuner's effectiveness by demonstrating comparable accuracy to non-encrypted models. Notably, our findings highlight a substantial speed enhancement of 1.5x to 600x over previous work in this domain.
- [1466] arXiv:2402.09066 (cross-list from cs.CV) [ pdf , ps , other ]
-
Title: Solid Waste Detection in Remote Sensing Images: A SurveySubjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: The detection and characterization of illegal solid waste disposal sites are essential for environmental protection, particularly for mitigating pollution and health hazards. Improperly managed landfills contaminate soil and groundwater via rainwater infiltration, posing threats to both animals and humans. Traditional landfill identification approaches, such as on-site inspections, are time-consuming and expensive. Remote sensing is a cost-effective solution for the identification and monitoring of solid waste disposal sites that enables broad coverage and repeated acquisitions over time. Earth Observation (EO) satellites, equipped with an array of sensors and imaging capabilities, have been providing high-resolution data for several decades. Researchers proposed specialized techniques that leverage remote sensing imagery to perform a range of tasks such as waste site detection, dumping site monitoring, and assessment of suitable locations for new landfills. This review aims to provide a detailed illustration of the most relevant proposals for the detection and monitoring of solid waste sites by describing and comparing the approaches, the implemented techniques, and the employed data. Furthermore, since the data sources are of the utmost importance for developing an effective solid waste detection model, a comprehensive overview of the satellites and publicly available data sets is presented. Finally, this paper identifies the open issues in the state-of-the-art and discusses the relevant research directions for reducing the costs and improving the effectiveness of novel solid waste detection methods.
- [1467] arXiv:2402.09078 (cross-list from cs.LG) [ pdf , ps , other ]
-
Title: Exploiting Estimation Bias in Deep Double Q-Learning for Actor-Critic MethodsSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Abstract: This paper introduces innovative methods in Reinforcement Learning (RL), focusing on addressing and exploiting estimation biases in Actor-Critic methods for continuous control tasks, using Deep Double Q-Learning. We propose two novel algorithms: Expectile Delayed Deep Deterministic Policy Gradient (ExpD3) and Bias Exploiting - Twin Delayed Deep Deterministic Policy Gradient (BE-TD3). ExpD3 aims to reduce overestimation bias with a single $Q$ estimate, offering a balance between computational efficiency and performance, while BE-TD3 is designed to dynamically select the most advantageous estimation bias during training. Our extensive experiments across various continuous control tasks demonstrate the effectiveness of our approaches. We show that these algorithms can either match or surpass existing methods like TD3, particularly in environments where estimation biases significantly impact learning. The results underline the importance of bias exploitation in improving policy learning in RL.
- [1468] arXiv:2402.09084 (cross-list from cs.LG) [ pdf , ps , other ]
-
Title: Sobolev Training for Operator LearningSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Abstract: This study investigates the impact of Sobolev Training on operator learning frameworks for improving model performance. Our research reveals that integrating derivative information into the loss function enhances the training process, and we propose a novel framework to approximate derivatives on irregular meshes in operator learning. Our findings are supported by both experimental evidence and theoretical analysis. This demonstrates the effectiveness of Sobolev Training in approximating the solution operators between infinite-dimensional spaces.
- [1469] arXiv:2402.09091 (cross-list from cs.CR) [ pdf , ps , html , other ]
-
Title: Play Guessing Game with LLM: Indirect Jailbreak Attack with Implicit CluesComments: 13 pages, 6 figuresSubjects: Cryptography and Security (cs.CR) ; Artificial Intelligence (cs.AI); Human-Computer Interaction (cs.HC)
Abstract: With the development of LLMs, the security threats of LLMs are getting more and more attention. Numerous jailbreak attacks have been proposed to assess the security defense of LLMs. Current jailbreak attacks primarily utilize scenario camouflage techniques. However their explicitly mention of malicious intent will be easily recognized and defended by LLMs. In this paper, we propose an indirect jailbreak attack approach, Puzzler, which can bypass the LLM's defense strategy and obtain malicious response by implicitly providing LLMs with some clues about the original malicious query. In addition, inspired by the wisdom of "When unable to attack, defend" from Sun Tzu's Art of War, we adopt a defensive stance to gather clues about the original malicious query through LLMs. Extensive experimental results show that Puzzler achieves a query success rate of 96.6% on closed-source LLMs, which is 57.9%-82.7% higher than baselines. Furthermore, when tested against the state-of-the-art jailbreak detection approaches, Puzzler proves to be more effective at evading detection compared to baselines.
- [1470] arXiv:2402.09097 (cross-list from cs.RO) [ pdf , ps , other ]
-
Title: A Digital Twin prototype for traffic sign recognition of a learning-enabled autonomous vehicleMohamed AbdElSalam , Loai Ali , Saddek Bensalem , Weicheng He , Panagiotis Katsaros , Nikolaos Kekatos , Doron Peled , Anastasios Temperekidis , Changshun WuSubjects: Robotics (cs.RO) ; Artificial Intelligence (cs.AI); Systems and Control (eess.SY)
Abstract: In this paper, we present a novel digital twin prototype for a learning-enabled self-driving vehicle. The primary objective of this digital twin is to perform traffic sign recognition and lane keeping. The digital twin architecture relies on co-simulation and uses the Functional Mock-up Interface and SystemC Transaction Level Modeling standards. The digital twin consists of four clients, i) a vehicle model that is designed in Amesim tool, ii) an environment model developed in Prescan, iii) a lane-keeping controller designed in Robot Operating System, and iv) a perception and speed control module developed in the formal modeling language of BIP (Behavior, Interaction, Priority). These clients interface with the digital twin platform, PAVE360-Veloce System Interconnect (PAVE360-VSI). PAVE360-VSI acts as the co-simulation orchestrator and is responsible for synchronization, interconnection, and data exchange through a server. The server establishes connections among the different clients and also ensures adherence to the Ethernet protocol. We conclude with illustrative digital twin simulations and recommendations for future work.
- [1471] arXiv:2402.09109 (cross-list from cs.AR) [ pdf , ps , other ]
-
Title: Stochastic Spiking Attention: Accelerating Attention with Stochastic Computing in Spiking NetworksSubjects: Hardware Architecture (cs.AR) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Neural and Evolutionary Computing (cs.NE); Signal Processing (eess.SP)
Abstract: Spiking Neural Networks (SNNs) have been recently integrated into Transformer architectures due to their potential to reduce computational demands and to improve power efficiency. Yet, the implementation of the attention mechanism using spiking signals on general-purpose computing platforms remains inefficient. In this paper, we propose a novel framework leveraging stochastic computing (SC) to effectively execute the dot-product attention for SNN-based Transformers. We demonstrate that our approach can achieve high classification accuracy ($83.53\%$) on CIFAR-10 within 10 time steps, which is comparable to the performance of a baseline artificial neural network implementation ($83.66\%$). We estimate that the proposed SC approach can lead to over $6.3\times$ reduction in computing energy and $1.7\times$ reduction in memory access costs for a digital CMOS-based ASIC design. We experimentally validate our stochastic attention block design through an FPGA implementation, which is shown to achieve $48\times$ lower latency as compared to a GPU implementation, while consuming $15\times$ less power.
- [1472] arXiv:2402.09126 (cross-list from cs.DC) [ pdf , ps , html , other ]
-
Title: MPIrigen: MPI Code Generation through Domain-Specific Language ModelsNadav Schneider , Niranjan Hasabnis , Vy A. Vo , Tal Kadosh , Neva Krien , Mihai Capotă , Guy Tamir , Ted Willke , Nesreen Ahmed , Yuval Pinter , Timothy Mattson , Gal OrenSubjects: Distributed, Parallel, and Cluster Computing (cs.DC) ; Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG); Software Engineering (cs.SE)
Abstract: The imperative need to scale computation across numerous nodes highlights the significance of efficient parallel computing, particularly in the realm of Message Passing Interface (MPI) integration. The challenging parallel programming task of generating MPI-based parallel programs has remained unexplored. This study first investigates the performance of state-of-the-art language models in generating MPI-based parallel programs. Findings reveal that widely used models such as GPT-3.5 and PolyCoder (specialized multi-lingual code models) exhibit notable performance degradation, when generating MPI-based programs compared to general-purpose programs. In contrast, domain-specific models such as MonoCoder, which are pretrained on MPI-related programming languages of C and C++, outperform larger models. Subsequently, we introduce a dedicated downstream task of MPI-based program generation by fine-tuning MonoCoder on HPCorpusMPI. We call the resulting model as MPIrigen. We propose an innovative preprocessing for completion only after observing the whole code, thus enabling better completion with a wider context. Comparative analysis against GPT-3.5 zero-shot performance, using a novel HPC-oriented evaluation method, demonstrates that MPIrigen excels in generating accurate MPI functions up to 0.8 accuracy in location and function predictions, and with more than 0.9 accuracy for argument predictions. The success of this tailored solution underscores the importance of domain-specific fine-tuning in optimizing language models for parallel computing code generation, paving the way for a new generation of automatic parallelization tools. The sources of this work are available at our GitHub MPIrigen repository: this https URL
- [1473] arXiv:2402.09129 (cross-list from cs.GT) [ pdf , ps , other ]
-
Title: Optimal Automated Market Makers: Differentiable Economics and Strong DualitySubjects: Computer Science and Game Theory (cs.GT) ; Artificial Intelligence (cs.AI); Theoretical Economics (econ.TH); Trading and Market Microstructure (q-fin.TR)
Abstract: The role of a market maker is to simultaneously offer to buy and sell quantities of goods, often a financial asset such as a share, at specified prices. An automated market maker (AMM) is a mechanism that offers to trade according to some predetermined schedule; the best choice of this schedule depends on the market maker's goals. The literature on the design of AMMs has mainly focused on prediction markets with the goal of information elicitation. More recent work motivated by DeFi has focused instead on the goal of profit maximization, but considering only a single type of good (traded with a numeraire), including under adverse selection (Milionis et al. 2022). Optimal market making in the presence of multiple goods, including the possibility of complex bundling behavior, is not well understood. In this paper, we show that finding an optimal market maker is dual to an optimal transport problem, with specific geometric constraints on the transport plan in the dual. We show that optimal mechanisms for multiple goods and under adverse selection can take advantage of bundling, both improved prices for bundled purchases and sales as well as sometimes accepting payment "in kind." We present conjectures of optimal mechanisms in additional settings which show further complex behavior. From a methodological perspective, we make essential use of the tools of differentiable economics to generate conjectures of optimal mechanisms, and give a proof-of-concept for the use of such tools in guiding theoretical investigations.
- [1474] arXiv:2402.09136 (cross-list from cs.CL) [ pdf , ps , other ]
-
Title: DolphCoder: Echo-Locating Code Large Language Models with Diverse and Multi-Objective Instruction TuningYejie Wang , Keqing He , Guanting Dong , Pei Wang , Weihao Zeng , Muxi Diao , Yutao Mou , Mengdi Zhang , Jingang Wang , Xunliang Cai , Weiran XuComments: 14 pages, 6 figuresSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI)
Abstract: Code Large Language Models (Code LLMs) have demonstrated outstanding performance in code-related tasks. Several instruction tuning approaches have been proposed to boost the code generation performance of pre-trained Code LLMs. In this paper, we introduce a diverse instruction model (DolphCoder) with self-evaluating for code generation. It learns diverse instruction targets and combines a code evaluation objective to enhance its code generation ability. Our model achieves superior performance on the HumanEval and MBPP benchmarks, demonstrating new insights for future code instruction tuning work. Our key findings are: (1) Augmenting more diverse responses with distinct reasoning paths increases the code capability of LLMs. (2) Improving one's ability to evaluate the correctness of code solutions also enhances their ability to create it.
- [1475] arXiv:2402.09141 (cross-list from cs.CL) [ pdf , ps , other ]
-
Title: Advancing NLP Models with Strategic Text Augmentation: A Comprehensive Study of Augmentation Methods and Curriculum StrategiesSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI)
Abstract: This study conducts a thorough evaluation of text augmentation techniques across a variety of datasets and natural language processing (NLP) tasks to address the lack of reliable, generalized evidence for these methods. It examines the effectiveness of these techniques in augmenting training sets to improve performance in tasks such as topic classification, sentiment analysis, and offensive language detection. The research emphasizes not only the augmentation methods, but also the strategic order in which real and augmented instances are introduced during training. A major contribution is the development and evaluation of Modified Cyclical Curriculum Learning (MCCL) for augmented datasets, which represents a novel approach in the field. Results show that specific augmentation methods, especially when integrated with MCCL, significantly outperform traditional training approaches in NLP model performance. These results underscore the need for careful selection of augmentation techniques and sequencing strategies to optimize the balance between speed and quality improvement in various NLP tasks. The study concludes that the use of augmentation methods, especially in conjunction with MCCL, leads to improved results in various classification tasks, providing a foundation for future advances in text augmentation strategies in NLP.
- [1476] arXiv:2402.09177 (cross-list from cs.LG) [ pdf , ps , other ]
-
Title: Leveraging the Context through Multi-Round Interactions for Jailbreaking AttacksComments: 29 pagesSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
Abstract: Large Language Models (LLMs) are susceptible to Jailbreaking attacks, which aim to extract harmful information by subtly modifying the attack query. As defense mechanisms evolve, directly obtaining harmful information becomes increasingly challenging for Jailbreaking attacks. In this work, inspired by human practices of indirect context to elicit harmful information, we focus on a new attack form called Contextual Interaction Attack. The idea relies on the autoregressive nature of the generation process in LLMs. We contend that the prior context--the information preceding the attack query--plays a pivotal role in enabling potent Jailbreaking attacks. Specifically, we propose an approach that leverages preliminary question-answer pairs to interact with the LLM. By doing so, we guide the responses of the model toward revealing the 'desired' harmful information. We conduct experiments on four different LLMs and demonstrate the efficacy of this attack, which is black-box and can also transfer across LLMs. We believe this can lead to further developments and understanding of the context vector in LLMs.
- [1477] arXiv:2402.09193 (cross-list from cs.CL) [ pdf , ps , other ]
-
Title: (Ir)rationality and Cognitive Biases in Large Language ModelsSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI); Human-Computer Interaction (cs.HC)
Abstract: Do large language models (LLMs) display rational reasoning? LLMs have been shown to contain human biases due to the data they have been trained on; whether this is reflected in rational reasoning remains less clear. In this paper, we answer this question by evaluating seven language models using tasks from the cognitive psychology literature. We find that, like humans, LLMs display irrationality in these tasks. However, the way this irrationality is displayed does not reflect that shown by humans. When incorrect answers are given by LLMs to these tasks, they are often incorrect in ways that differ from human-like biases. On top of this, the LLMs reveal an additional layer of irrationality in the significant inconsistency of the responses. Aside from the experimental results, this paper seeks to make a methodological contribution by showing how we can assess and compare different capabilities of these types of models, in this case with respect to rational reasoning.
- [1478] arXiv:2402.09199 (cross-list from cs.CL) [ pdf , ps , other ]
-
Title: Ten Words Only Still Help: Improving Black-Box AI-Generated Text Detection via Proxy-Guided Efficient Re-SamplingComments: 13 pages, 6 figures, 7 tablesSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: With the rapidly increasing application of large language models (LLMs), their abuse has caused many undesirable societal problems such as fake news, academic dishonesty, and information pollution. This makes AI-generated text (AIGT) detection of great importance. Among existing methods, white-box methods are generally superior to black-box methods in terms of performance and generalizability, but they require access to LLMs' internal states and are not applicable to black-box settings. In this paper, we propose to estimate word generation probabilities as pseudo white-box features via multiple re-sampling to help improve AIGT detection under the black-box setting. Specifically, we design POGER, a proxy-guided efficient re-sampling method, which selects a small subset of representative words (e.g., 10 words) for performing multiple re-sampling in black-box AIGT detection. Experiments on datasets containing texts from humans and seven LLMs show that POGER outperforms all baselines in macro F1 under black-box, partial white-box, and out-of-distribution settings and maintains lower re-sampling costs than its existing counterparts.
- [1479] arXiv:2402.09200 (cross-list from cs.CR) [ pdf , ps , other ]
-
Title: Discovering Command and Control (C2) Channels on Tor and Public Networks Using Reinforcement LearningCheng Wang , Christopher Redino , Abdul Rahman , Ryan Clark , Daniel Radke , Tyler Cody , Dhruv Nandakumar , Edward BowenSubjects: Cryptography and Security (cs.CR) ; Artificial Intelligence (cs.AI)
Abstract: Command and control (C2) channels are an essential component of many types of cyber attacks, as they enable attackers to remotely control their malware-infected machines and execute harmful actions, such as propagating malicious code across networks, exfiltrating confidential data, or initiating distributed denial of service (DDoS) attacks. Identifying these C2 channels is therefore crucial in helping to mitigate and prevent cyber attacks. However, identifying C2 channels typically involves a manual process, requiring deep knowledge and expertise in cyber operations. In this paper, we propose a reinforcement learning (RL) based approach to automatically emulate C2 attack campaigns using both the normal (public) and the Tor networks. In addition, payload size and network firewalls are configured to simulate real-world attack scenarios. Results on a typical network configuration show that the RL agent can automatically discover resilient C2 attack paths utilizing both Tor-based and conventional communication channels, while also bypassing network firewalls.
- [1480] arXiv:2402.09205 (cross-list from cs.CL) [ pdf , ps , other ]
-
Title: Tell Me More! Towards Implicit User Intention Understanding of Language Model Driven AgentsCheng Qian , Bingxiang He , Zhong Zhuang , Jia Deng , Yujia Qin , Xin Cong , Zhong Zhang , Jie Zhou , Yankai Lin , Zhiyuan Liu , Maosong SunComments: 26 pages, 5 tables, 6 figuresSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI); Human-Computer Interaction (cs.HC)
Abstract: Current language model-driven agents often lack mechanisms for effective user participation, which is crucial given the vagueness commonly found in user instructions. Although adept at devising strategies and performing tasks, these agents struggle with seeking clarification and grasping precise user intentions. To bridge this gap, we introduce Intention-in-Interaction (IN3), a novel benchmark designed to inspect users' implicit intentions through explicit queries. Next, we propose the incorporation of model experts as the upstream in agent designs to enhance user-agent interaction. Employing IN3, we empirically train Mistral-Interact, a powerful model that proactively assesses task vagueness, inquires user intentions, and refines them into actionable goals before starting downstream agent task execution. Integrating it into the XAgent framework, we comprehensively evaluate the enhanced agent system regarding user instruction understanding and execution, revealing that our approach notably excels at identifying vague user tasks, recovering and summarizing critical missing information, setting precise and necessary agent execution goals, and minimizing redundant tool usage, thus boosting overall efficiency. All the data and codes are released.
- [1481] arXiv:2402.09211 (cross-list from cs.CV) [ pdf , ps , html , other ]
-
Title: DivaTrack: Diverse Bodies and Motions from Acceleration-Enhanced Three-Point TrackersComments: accepted to Eurographics 2024Subjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI)
Abstract: Full-body avatar presence is crucial for immersive social and environmental interactions in digital reality. However, current devices only provide three six degrees of freedom (DOF) poses from the headset and two controllers (i.e. three-point trackers). Because it is a highly under-constrained problem, inferring full-body pose from these inputs is challenging, especially when supporting the full range of body proportions and use cases represented by the general population. In this paper, we propose a deep learning framework, DivaTrack, which outperforms existing methods when applied to diverse body sizes and activities. We augment the sparse three-point inputs with linear accelerations from Inertial Measurement Units (IMU) to improve foot contact prediction. We then condition the otherwise ambiguous lower-body pose with the predictions of foot contact and upper-body pose in a two-stage model. We further stabilize the inferred full-body pose in a wide range of configurations by learning to blend predictions that are computed in two reference frames, each of which is designed for different types of motions. We demonstrate the effectiveness of our design on a large dataset that captures 22 subjects performing challenging locomotion for three-point tracking, including lunges, hula-hooping, and sitting. As shown in a live demo using the Meta VR headset and Xsens IMUs, our method runs in real-time while accurately tracking a user's motion when they perform a diverse set of movements.
- [1482] arXiv:2402.09225 (cross-list from cs.CV) [ pdf , ps , other ]
-
Title: Is my Data in your AI Model? Membership Inference Test with Application to Face ImagesDaniel DeAlcala , Aythami Morales , Gonzalo Mancera , Julian Fierrez , Ruben Tolosana , Javier Ortega-GarciaComments: 10 pagesSubjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI)
Abstract: This paper introduces the Membership Inference Test (MINT), a novel approach that aims to empirically assess if specific data was used during the training of Artificial Intelligence (AI) models. Specifically, we propose two novel MINT architectures designed to learn the distinct activation patterns that emerge when an audited model is exposed to data used during its training process. The first architecture is based on a Multilayer Perceptron (MLP) network and the second one is based on Convolutional Neural Networks (CNNs). The proposed MINT architectures are evaluated on a challenging face recognition task, considering three state-of-the-art face recognition models. Experiments are carried out using six publicly available databases, comprising over 22 million face images in total. Also, different experimental scenarios are considered depending on the context available of the AI model to test. Promising results, up to 90% accuracy, are achieved using our proposed MINT approach, suggesting that it is possible to recognize if an AI model has been trained with specific data.
- [1483] arXiv:2402.09233 (cross-list from cs.RO) [ pdf , ps , html , other ]
-
Title: Design and Realization of a Benchmarking Testbed for Evaluating Autonomous Platooning AlgorithmsComments: To be published in International Symposium on Experimental Robotics, 2023Subjects: Robotics (cs.RO) ; Artificial Intelligence (cs.AI); Multiagent Systems (cs.MA); Systems and Control (eess.SY); Optimization and Control (math.OC)
Abstract: Autonomous vehicle platoons present near- and long-term opportunities to enhance operational efficiencies and save lives. The past 30 years have seen rapid development in the autonomous driving space, enabling new technologies that will alleviate the strain placed on human drivers and reduce vehicle emissions. This paper introduces a testbed for evaluating and benchmarking platooning algorithms on 1/10th scale vehicles with onboard sensors. To demonstrate the testbed's utility, we evaluate three algorithms, linear feedback and two variations of distributed model predictive control, and compare their results on a typical platooning scenario where the lead vehicle tracks a reference trajectory that changes speed multiple times. We validate our algorithms in simulation to analyze the performance as the platoon size increases, and find that the distributed model predictive control algorithms outperform linear feedback on hardware and in simulation.
- [1484] arXiv:2402.09236 (cross-list from cs.LG) [ pdf , ps , other ]
-
Title: Learning Interpretable Concepts: Unifying Causal Representation Learning and Foundation ModelsComments: 36 pagesSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Statistics Theory (math.ST); Machine Learning (stat.ML)
Abstract: To build intelligent machine learning systems, there are two broad approaches. One approach is to build inherently interpretable models, as endeavored by the growing field of causal representation learning. The other approach is to build highly-performant foundation models and then invest efforts into understanding how they work. In this work, we relate these two approaches and study how to learn human-interpretable concepts from data. Weaving together ideas from both fields, we formally define a notion of concepts and show that they can be provably recovered from diverse data. Experiments on synthetic data and large language models show the utility of our unified approach.
- [1485] arXiv:2402.09246 (cross-list from cs.RO) [ pdf , ps , html , other ]
-
Title: Who Plays First? Optimizing the Order of Play in Stackelberg Games with Many RobotsSubjects: Robotics (cs.RO) ; Artificial Intelligence (cs.AI); Systems and Control (eess.SY); Optimization and Control (math.OC)
Abstract: We consider the multi-agent spatial navigation problem of computing the socially optimal order of play, i.e., the sequence in which the agents commit to their decisions, and its associated equilibrium in an N-player Stackelberg trajectory game. We model this problem as a mixed-integer optimization problem over the space of all possible Stackelberg games associated with the order of play's permutations. To solve the problem, we introduce Branch and Play (B&P), an efficient and exact algorithm that provably converges to a socially optimal order of play and its Stackelberg equilibrium. As a subroutine for B&P, we employ and extend sequential trajectory planning, i.e., a popular multi-agent control approach, to scalably compute valid local Stackelberg equilibria for any given order of play. We demonstrate the practical utility of B&P to coordinate air traffic control, swarm formation, and delivery vehicle fleets. We find that B&P consistently outperforms various baselines, and computes the socially optimal equilibrium.
- [1486] arXiv:2402.09251 (cross-list from physics.comp-ph) [ pdf , ps , other ]
-
Title: Universal Machine Learning Kohn-Sham Hamiltonian for MaterialsComments: 20 pages, 9 figuresSubjects: Computational Physics (physics.comp-ph) ; Materials Science (cond-mat.mtrl-sci); Artificial Intelligence (cs.AI)
Abstract: While density functional theory (DFT) serves as a prevalent computational approach in electronic structure calculations, its computational demands and scalability limitations persist. Recently, leveraging neural networks to parameterize the Kohn-Sham DFT Hamiltonian has emerged as a promising avenue for accelerating electronic structure computations. Despite advancements, challenges such as the necessity for computing extensive DFT training data to explore each new system and the complexity of establishing accurate ML models for multi-elemental materials still exist. Addressing these hurdles, this study introduces a universal electronic Hamiltonian model trained on Hamiltonian matrices obtained from first-principles DFT calculations of nearly all crystal structures on the Materials Project. We demonstrate its generality in predicting electronic structures across the whole periodic table, including complex multi-elemental systems, solid-state electrolytes, Moiré twisted bilayer heterostructure, and metal-organic frameworks (MOFs). Moreover, we utilize the universal model to conduct high-throughput calculations of electronic structures for crystals in GeNOME datasets, identifying 3,940 crystals with direct band gaps and 5,109 crystals with flat bands. By offering a reliable efficient framework for computing electronic properties, this universal Hamiltonian model lays the groundwork for advancements in diverse fields, such as easily providing a huge data set of electronic structures and also making the materials design across the whole periodic table possible.
- [1487] arXiv:2402.09259 (cross-list from cs.CL) [ pdf , ps , other ]
-
Title: SyntaxShap: Syntax-aware Explainability Method for Text GenerationComments: Submitted to ACL 2024Subjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI)
Abstract: To harness the power of large language models in safety-critical domains we need to ensure the explainability of their predictions. However, despite the significant attention to model interpretability, there remains an unexplored domain in explaining sequence-to-sequence tasks using methods tailored for textual data. This paper introduces SyntaxShap, a local, model-agnostic explainability method for text generation that takes into consideration the syntax in the text data. The presented work extends Shapley values to account for parsing-based syntactic dependencies. Taking a game theoric approach, SyntaxShap only considers coalitions constraint by the dependency tree. We adopt a model-based evaluation to compare SyntaxShap and its weighted form to state-of-the-art explainability methods adapted to text generation tasks, using diverse metrics including faithfulness, complexity, coherency, and semantic alignment of the explanations to the model. We show that our syntax-aware method produces explanations that help build more faithful, coherent, and interpretable explanations for predictions by autoregressive models.
- [1488] arXiv:2402.09265 (cross-list from cs.DB) [ pdf , ps , other ]
-
Title: Computational Complexity of Preferred Subset Repairs on Data-GraphsComments: 16 pages, 2 figures, AppendixSubjects: Databases (cs.DB) ; Artificial Intelligence (cs.AI); Logic in Computer Science (cs.LO)
Abstract: The problem of repairing inconsistent knowledge bases has a long history within the communities of database theory and knowledge representation and reasoning, especially from the perspective of structured data. However, as the data available in real-world domains becomes more complex and interconnected, the need naturally arises for developing new types of repositories, representation languages, and semantics, to allow for more suitable ways to query and reason about it. Graph databases provide an effective way to represent relationships among semi-structured data, and allow processing and querying these connections efficiently. In this work, we focus on the problem of computing prioritized repairs over graph databases with data values, using a notion of consistency based on Reg-GXPath expressions as integrity constraints. We present several preference criteria based on the standard subset repair semantics, incorporating weights, multisets, and set-based priority levels. We study the most common repairing tasks, showing that it is possible to maintain the same computational complexity as in the case where no preference criterion is available for exploitation. To complete the picture, we explore the complexity of consistent query answering in this setting and obtain tight lower and upper bounds for all the preference criteria introduced.
- [1489] arXiv:2402.09267 (cross-list from cs.CL) [ pdf , ps , other ]
-
Title: Self-Alignment for Factuality: Mitigating Hallucinations in LLMs via Self-EvaluationXiaoying Zhang , Baolin Peng , Ye Tian , Jingyan Zhou , Lifeng Jin , Linfeng Song , Haitao Mi , Helen MengComments: 19 pagesSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI)
Abstract: Despite showing increasingly human-like abilities, large language models (LLMs) often struggle with factual inaccuracies, i.e. "hallucinations", even when they hold relevant knowledge. To address these hallucinations, current approaches typically necessitate high-quality human factuality annotations. In this work, we explore Self-Alignment for Factuality, where we leverage the self-evaluation capability of an LLM to provide training signals that steer the model towards factuality. Specifically, we incorporate Self-Eval, a self-evaluation component, to prompt an LLM to validate the factuality of its own generated responses solely based on its internal knowledge. Additionally, we design Self-Knowledge Tuning (SK-Tuning) to augment the LLM's self-evaluation ability by improving the model's confidence estimation and calibration. We then utilize these self-annotated responses to fine-tune the model via Direct Preference Optimization algorithm. We show that the proposed self-alignment approach substantially enhances factual accuracy over Llama family models across three key knowledge-intensive tasks on TruthfulQA and BioGEN.
- [1490] arXiv:2402.09269 (cross-list from cs.CL) [ pdf , ps , other ]
-
Title: Personalized Large Language ModelsSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI)
Abstract: Large language models (LLMs) have significantly advanced Natural Language Processing (NLP) tasks in recent years. However, their universal nature poses limitations in scenarios requiring personalized responses, such as recommendation systems and chatbots. This paper investigates methods to personalize LLMs, comparing fine-tuning and zero-shot reasoning approaches on subjective tasks. Results demonstrate that personalized fine-tuning improves model reasoning compared to non-personalized models. Experiments on datasets for emotion recognition and hate speech detection show consistent performance gains with personalized methods across different LLM architectures. These findings underscore the importance of personalization for enhancing LLM capabilities in subjective text perception tasks.
- [1491] arXiv:2402.09281 (cross-list from cs.LG) [ pdf , ps , html , other ]
-
Title: Synergistic eigenanalysis of covariance and Hessian matrices for enhanced binary classificationAgus Hartoyo , Jan Argasiński , Aleksandra Trenk , Kinga Przybylska , Anna Błasiak , Alessandro CrimiComments: 19 pages, 6 figuresSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Abstract: Covariance and Hessian matrices have been analyzed separately in the literature for classification problems. However, integrating these matrices has the potential to enhance their combined power in improving classification performance. We present a novel approach that combines the eigenanalysis of a covariance matrix evaluated on a training set with a Hessian matrix evaluated on a deep learning model to achieve optimal class separability in binary classification tasks. Our approach is substantiated by formal proofs that establish its capability to maximize between-class mean distance and minimize within-class variances. By projecting data into the combined space of the most relevant eigendirections from both matrices, we achieve optimal class separability as per the linear discriminant analysis (LDA) criteria. Empirical validation across neural and health datasets consistently supports our theoretical framework and demonstrates that our method outperforms established methods. Our method stands out by addressing both LDA criteria, unlike PCA and the Hessian method, which predominantly emphasize one criterion each. This comprehensive approach captures intricate patterns and relationships, enhancing classification performance. Furthermore, through the utilization of both LDA criteria, our method outperforms LDA itself by leveraging higher-dimensional feature spaces, in accordance with Cover's theorem, which favors linear separability in higher dimensions. Our method also surpasses kernel-based methods and manifold learning techniques in performance. Additionally, our approach sheds light on complex DNN decision-making, rendering them comprehensible within a 2D space.
- [1492] arXiv:2402.09283 (cross-list from cs.CL) [ pdf , ps , html , other ]
-
Title: Attacks, Defenses and Evaluations for LLM Conversation Safety: A SurveyComments: Accepted to NAACL 2024Subjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI); Computers and Society (cs.CY); Machine Learning (cs.LG)
Abstract: Large Language Models (LLMs) are now commonplace in conversation applications. However, their risks of misuse for generating harmful responses have raised serious societal concerns and spurred recent research on LLM conversation safety. Therefore, in this survey, we provide a comprehensive overview of recent studies, covering three critical aspects of LLM conversation safety: attacks, defenses, and evaluations. Our goal is to provide a structured summary that enhances understanding of LLM conversation safety and encourages further investigation into this important subject. For easy reference, we have categorized all the studies mentioned in this survey according to our taxonomy, available at: this https URL .
- [1493] arXiv:2402.09290 (cross-list from cs.LG) [ pdf , ps , other ]
-
Title: Learning Interpretable Policies in Hindsight-Observable POMDPs through Partially Supervised Reinforcement LearningSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Abstract: Deep reinforcement learning has demonstrated remarkable achievements across diverse domains such as video games, robotic control, autonomous driving, and drug discovery. Common methodologies in partially-observable domains largely lean on end-to-end learning from high-dimensional observations, such as images, without explicitly reasoning about true state. We suggest an alternative direction, introducing the Partially Supervised Reinforcement Learning (PSRL) framework. At the heart of PSRL is the fusion of both supervised and unsupervised learning. The approach leverages a state estimator to distill supervised semantic state information from high-dimensional observations which are often fully observable at training time. This yields more interpretable policies that compose state predictions with control. In parallel, it captures an unsupervised latent representation. These two-the semantic state and the latent state-are then fused and utilized as inputs to a policy network. This juxtaposition offers practitioners a flexible and dynamic spectrum: from emphasizing supervised state information to integrating richer, latent insights. Extensive experimental results indicate that by merging these dual representations, PSRL offers a potent balance, enhancing model interpretability while preserving, and often significantly outperforming, the performance benchmarks set by traditional methods in terms of reward and convergence speed.
- [1494] arXiv:2402.09303 (cross-list from cs.CV) [ pdf , ps , other ]
-
Title: Immediate generalisation in humans but a generalisation lag in deep neural networks -- evidence for representational divergence?Comments: Under review at the ICLR 2024 Workshop on Representational Alignment (Re-Align)Subjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Neurons and Cognition (q-bio.NC)
Abstract: Recent research has seen many behavioral comparisons between humans and deep neural networks (DNNs) in the domain of image classification. Often, comparison studies focus on the end-result of the learning process by measuring and comparing the similarities in the representations of object categories once they have been formed. However, the process of how these representations emerge -- that is, the behavioral changes and intermediate stages observed during the acquisition -- is less often directly and empirically compared. Here we report a detailed investigation of how transferable representations are acquired in human observers and various classic and state-of-the-art DNNs. We develop a constrained supervised learning environment in which we align learning-relevant parameters such as starting point, input modality, available input data and the feedback provided. Across the whole learning process we evaluate and compare how well learned representations can be generalized to previously unseen test data. Our findings indicate that in terms of absolute classification performance DNNs demonstrate a level of data efficiency comparable to -- and sometimes even exceeding that -- of human learners, challenging some prevailing assumptions in the field. However, comparisons across the entire learning process reveal significant representational differences: while DNNs' learning is characterized by a pronounced generalisation lag, humans appear to immediately acquire generalizable representations without a preliminary phase of learning training set-specific information that is only later transferred to novel data.
- [1495] arXiv:2402.09305 (cross-list from cs.LG) [ pdf , ps , html , other ]
-
Title: Embracing the black box: Heading towards foundation models for causal discovery from time series dataComments: AAAI Workshop (AI4TS) 2024Subjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Abstract: Causal discovery from time series data encompasses many existing solutions, including those based on deep learning techniques. However, these methods typically do not endorse one of the most prevalent paradigms in deep learning: End-to-end learning. To address this gap, we explore what we call Causal Pretraining. A methodology that aims to learn a direct mapping from multivariate time series to the underlying causal graphs in a supervised manner. Our empirical findings suggest that causal discovery in a supervised manner is possible, assuming that the training and test time series samples share most of their dynamics. More importantly, we found evidence that the performance of Causal Pretraining can increase with data and model size, even if the additional data do not share the same dynamics. Further, we provide examples where causal discovery for real-world data with causally pretrained neural networks is possible within limits. We argue that this hints at the possibility of a foundation model for causal discovery.
- [1496] arXiv:2402.09318 (cross-list from cs.SD) [ pdf , ps , other ]
-
Title: Leveraging Pre-Trained Autoencoders for Interpretable Prototype Learning of Music AudioPablo Alonso-Jiménez , Leonardo Pepino , Roser Batlle-Roca , Pablo Zinemanas , Dmitry Bogdanov , Xavier Serra , Martín RocamoraSubjects: Sound (cs.SD) ; Artificial Intelligence (cs.AI); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)
Abstract: We present PECMAE, an interpretable model for music audio classification based on prototype learning. Our model is based on a previous method, APNet, which jointly learns an autoencoder and a prototypical network. Instead, we propose to decouple both training processes. This enables us to leverage existing self-supervised autoencoders pre-trained on much larger data (EnCodecMAE), providing representations with better generalization. APNet allows prototypes' reconstruction to waveforms for interpretability relying on the nearest training data samples. In contrast, we explore using a diffusion decoder that allows reconstruction without such dependency. We evaluate our method on datasets for music instrument classification (Medley-Solos-DB) and genre recognition (GTZAN and a larger in-house dataset), the latter being a more challenging task not addressed with prototypical networks before. We find that the prototype-based models preserve most of the performance achieved with the autoencoder embeddings, while the sonification of prototypes benefits understanding the behavior of the classifier.
- [1497] arXiv:2402.09320 (cross-list from cs.CL) [ pdf , ps , html , other ]
-
Title: ICDPO: Effectively Borrowing Alignment Capability of Others via In-context Direct Preference OptimizationSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI)
Abstract: Large Language Models (LLMs) rely on Human Preference Alignment (HPA) to ensure the generation of safe content. Due to the heavy cost associated with fine-tuning, fine-tuning-free methods have emerged, typically modifying LLM decoding with external auxiliary methods. However, these methods do not essentially enhance the LLM itself. In this paper, we rethink the derivation procedures of DPO, based on which we conversely build an instant scorer using the states of the LLM before and after In-context Learning (ICL). Accordingly, we propose a novel approach called In-Context Direct Preference Optimization (ICDPO). It enables LLMs to borrow the HPA capabilities from superior LLMs with ICL, generating well-aligned responses as estimated by the aforementioned instant scorer, thereby enhancing the final performance. ICDPO can be further enhanced with a two-stage retriever and an upgraded scorer, both offering benefits. Extensive experiments show its effectiveness, particularly in outperforming two fine-tuning-free baselines, and it exhibits competitiveness with SFT + LoRA. We also conduct detailed analyses to offer comprehensive insights into ICDPO.
- [1498] arXiv:2402.09338 (cross-list from physics.comp-ph) [ pdf , ps , other ]
-
Title: Neural Networks Asymptotic Behaviours for the Resolution of Inverse ProblemsSubjects: Computational Physics (physics.comp-ph) ; Artificial Intelligence (cs.AI); High Energy Physics - Lattice (hep-lat); High Energy Physics - Theory (hep-th)
Abstract: This paper presents a study of the effectiveness of Neural Network (NN) techniques for deconvolution inverse problems relevant for applications in Quantum Field Theory, but also in more general contexts. We consider NN's asymptotic limits, corresponding to Gaussian Processes (GPs), where non-linearities in the parameters of the NN can be neglected. Using these resulting GPs, we address the deconvolution inverse problem in the case of a quantum harmonic oscillator simulated through Monte Carlo techniques on a lattice. In this simple toy model, the results of the inversion can be compared with the known analytical solution. Our findings indicate that solving the inverse problem with a NN yields less performing results than those obtained using the GPs derived from NN's asymptotic limits. Furthermore, we observe the trained NN's accuracy approaching that of GPs with increasing layer width. Notably, one of these GPs defies interpretation as a probabilistic model, offering a novel perspective compared to established methods in the literature. Our results suggest the need for detailed studies of the training dynamics in more realistic set-ups.
- [1499] arXiv:2402.09345 (cross-list from cs.LG) [ pdf , ps , html , other ]
-
Title: Mitigating Reward Hacking via Information-Theoretic Reward ModelingComments: 26 pages, 28 figuresSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Abstract: Despite the success of reinforcement learning from human feedback (RLHF) in aligning language models with human values, reward hacking, also termed reward overoptimization, remains a critical challenge, which primarily stems from limitations in reward modeling, i.e., generalizability of the reward model and inconsistency in the preference dataset. In this work, we tackle this problem from an information theoretic-perspective, and propose a generalizable and robust framework for reward modeling, namely InfoRM, by introducing a variational information bottleneck objective to filter out irrelevant information and developing a mechanism for model complexity modulation. Notably, we further identify a correlation between overoptimization and outliers in the latent space, establishing InfoRM as a promising tool for detecting reward overoptimization. Inspired by this finding, we propose the Integrated Cluster Deviation Score (ICDS), which quantifies deviations in the latent space, as an indicator of reward overoptimization to facilitate the development of online mitigation strategies. Extensive experiments on a wide range of settings and model scales (70M, 440M, 1.4B, and 7B) support the effectiveness of InfoRM. Further analyses reveal that InfoRM's overoptimization detection mechanism is effective, potentially signifying a notable advancement in the field of RLHF. Code will be released upon acceptance.
- [1500] arXiv:2402.09355 (cross-list from cs.RO) [ pdf , ps , other ]
-
Title: Single-Reset Divide & Conquer Imitation LearningSubjects: Robotics (cs.RO) ; Artificial Intelligence (cs.AI)
Abstract: Demonstrations are commonly used to speed up the learning process of Deep Reinforcement Learning algorithms. To cope with the difficulty of accessing multiple demonstrations, some algorithms have been developed to learn from a single demonstration. In particular, the Divide & Conquer Imitation Learning algorithms leverage a sequential bias to learn a control policy for complex robotic tasks using a single state-based demonstration. The latest version, DCIL-II demonstrates remarkable sample efficiency. This novel method operates within an extended Goal-Conditioned Reinforcement Learning framework, ensuring compatibility between intermediate and subsequent goals extracted from the demonstration. However, a fundamental limitation arises from the assumption that the system can be reset to specific states along the demonstrated trajectory, confining the application to simulated systems. In response, we introduce an extension called Single-Reset DCIL (SR-DCIL), designed to overcome this constraint by relying on a single initial state reset rather than sequential resets. To address this more challenging setting, we integrate two mechanisms inspired by the Learning from Demonstrations literature, including a Demo-Buffer and Value Cloning, to guide the agent toward compatible success states. In addition, we introduce Approximate Goal Switching to facilitate training to reach goals distant from the reset state. Our paper makes several contributions, highlighting the importance of the reset assumption in DCIL-II, presenting the mechanisms of SR-DCIL variants and evaluating their performance in challenging robotic tasks compared to DCIL-II. In summary, this work offers insights into the significance of reset assumptions in the framework of DCIL and proposes SR-DCIL, a first step toward a versatile algorithm capable of learning control policies under a weaker reset assumption.
- [1501] arXiv:2402.09360 (cross-list from cs.LG) [ pdf , ps , html , other ]
-
Title: HiRE: High Recall Approximate Top-$k$ Estimation for Efficient LLM InferenceYashas Samaga B L , Varun Yerram , Chong You , Srinadh Bhojanapalli , Sanjiv Kumar , Prateek Jain , Praneeth NetrapalliSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Abstract: Autoregressive decoding with generative Large Language Models (LLMs) on accelerators (GPUs/TPUs) is often memory-bound where most of the time is spent on transferring model parameters from high bandwidth memory (HBM) to cache. On the other hand, recent works show that LLMs can maintain quality with significant sparsity/redundancy in the feedforward (FFN) layers by appropriately training the model to operate on a top-$k$ fraction of rows/columns (where $k \approx 0.05$), there by suggesting a way to reduce the transfer of model parameters, and hence latency. However, exploiting this sparsity for improving latency is hindered by the fact that identifying top rows/columns is data-dependent and is usually performed using full matrix operations, severely limiting potential gains. To address these issues, we introduce HiRE (High Recall Approximate Top-k Estimation). HiRE comprises of two novel components: (i) a compression scheme to cheaply predict top-$k$ rows/columns with high recall, followed by full computation restricted to the predicted subset, and (ii) DA-TOP-$k$: an efficient multi-device approximate top-$k$ operator. We demonstrate that on a one billion parameter model, HiRE applied to both the softmax as well as feedforward layers, achieves almost matching pretraining and downstream accuracy, and speeds up inference latency by $1.47\times$ on a single TPUv5e device.
- [1502] arXiv:2402.09368 (cross-list from cs.CV) [ pdf , ps , html , other ]
-
Title: Magic-Me: Identity-Specific Video Customized DiffusionZe Ma , Daquan Zhou , Chun-Hsiao Yeh , Xue-She Wang , Xiuyu Li , Huanrui Yang , Zhen Dong , Kurt Keutzer , Jiashi FengComments: Project Page at this https URLSubjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI)
Abstract: Creating content with specified identities (ID) has attracted significant interest in the field of generative models. In the field of text-to-image generation (T2I), subject-driven creation has achieved great progress with the identity controlled via reference images. However, its extension to video generation is not well explored. In this work, we propose a simple yet effective subject identity controllable video generation framework, termed Video Custom Diffusion (VCD). With a specified identity defined by a few images, VCD reinforces the identity characteristics and injects frame-wise correlation at the initialization stage for stable video outputs. To achieve this, we propose three novel components that are essential for high-quality identity preservation and stable video generation: 1) a noise initialization method with 3D Gaussian Noise Prior for better inter-frame stability; 2) an ID module based on extended Textual Inversion trained with the cropped identity to disentangle the ID information from the background 3) Face VCD and Tiled VCD modules to reinforce faces and upscale the video to higher resolution while preserving the identity's features. We conducted extensive experiments to verify that VCD is able to generate stable videos with better ID over the baselines. Besides, with the transferability of the encoded identity in the ID module, VCD is also working well with personalized text-to-image models available publicly. The codes are available at this https URL .
- [1503] arXiv:2402.09370 (cross-list from cs.CR) [ pdf , ps , other ]
-
Title: Pseudorandom Error-Correcting CodesSubjects: Cryptography and Security (cs.CR) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: We construct pseudorandom error-correcting codes (or simply pseudorandom codes), which are error-correcting codes with the property that any polynomial number of codewords are pseudorandom to any computationally-bounded adversary. Efficient decoding of corrupted codewords is possible with the help of a decoding key.
We build pseudorandom codes that are robust to substitution and deletion errors, where pseudorandomness rests on standard cryptographic assumptions. Specifically, pseudorandomness is based on either $2^{O(\sqrt{n})}$-hardness of LPN, or polynomial hardness of LPN and the planted XOR problem at low density.
As our primary application of pseudorandom codes, we present an undetectable watermarking scheme for outputs of language models that is robust to cropping and a constant rate of random substitutions and deletions. The watermark is undetectable in the sense that any number of samples of watermarked text are computationally indistinguishable from text output by the original model. This is the first undetectable watermarking scheme that can tolerate a constant rate of errors.
Our second application is to steganography, where a secret message is hidden in innocent-looking content. We present a constant-rate stateless steganography scheme with robustness to a constant rate of substitutions. Ours is the first stateless steganography scheme with provable steganographic security and any robustness to errors. - [1504] arXiv:2402.09371 (cross-list from cs.LG) [ pdf , ps , other ]
-
Title: Transformers Can Achieve Length Generalization But Not RobustlySubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
Abstract: Length generalization, defined as the ability to extrapolate from shorter training sequences to longer test ones, is a significant challenge for language models. This issue persists even with large-scale Transformers handling relatively straightforward tasks. In this paper, we test the Transformer's ability of length generalization using the task of addition of two integers. We show that the success of length generalization is intricately linked to the data format and the type of position encoding. Using the right combination of data format and position encodings, we show for the first time that standard Transformers can extrapolate to a sequence length that is 2.5x the input length. Nevertheless, unlike in-distribution generalization, length generalization remains fragile, significantly influenced by factors like random weight initialization and training data order, leading to large variances across different random seeds.
- [1505] arXiv:2402.09372 (cross-list from eess.IV) [ pdf , ps , other ]
-
Title: Deep Rib Fracture Instance Segmentation and Classification from CT on the RibFrac ChallengeJiancheng Yang , Rui Shi , Liang Jin , Xiaoyang Huang , Kaiming Kuang , Donglai Wei , Shixuan Gu , Jianying Liu , Pengfei Liu , Zhizhong Chai , Yongjie Xiao , Hao Chen , Liming Xu , Bang Du , Xiangyi Yan , Hao Tang , Adam Alessio , Gregory Holste , Jiapeng Zhang , Xiaoming Wang , Jianye He , Lixuan Che , Hanspeter Pfister , Ming Li , Bingbing NiComments: Challenge paper for MICCAI RibFrac Challenge ( this https URL )Subjects: Image and Video Processing (eess.IV) ; Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
Abstract: Rib fractures are a common and potentially severe injury that can be challenging and labor-intensive to detect in CT scans. While there have been efforts to address this field, the lack of large-scale annotated datasets and evaluation benchmarks has hindered the development and validation of deep learning algorithms. To address this issue, the RibFrac Challenge was introduced, providing a benchmark dataset of over 5,000 rib fractures from 660 CT scans, with voxel-level instance mask annotations and diagnosis labels for four clinical categories (buckle, nondisplaced, displaced, or segmental). The challenge includes two tracks: a detection (instance segmentation) track evaluated by an FROC-style metric and a classification track evaluated by an F1-style metric. During the MICCAI 2020 challenge period, 243 results were evaluated, and seven teams were invited to participate in the challenge summary. The analysis revealed that several top rib fracture detection solutions achieved performance comparable or even better than human experts. Nevertheless, the current rib fracture classification solutions are hardly clinically applicable, which can be an interesting area in the future. As an active benchmark and research resource, the data and online evaluation of the RibFrac Challenge are available at the challenge website. As an independent contribution, we have also extended our previous internal baseline by incorporating recent advancements in large-scale pretrained networks and point-based rib segmentation techniques. The resulting FracNet+ demonstrates competitive performance in rib fracture detection, which lays a foundation for further research and development in AI-assisted rib fracture detection and diagnosis.
- [1506] arXiv:2402.09384 (cross-list from econ.TH) [ pdf , ps , html , other ]
-
Title: Persuasion, Delegation, and Private Information in Algorithm-Assisted DecisionsSubjects: Theoretical Economics (econ.TH) ; Artificial Intelligence (cs.AI); Computers and Society (cs.CY); Computer Science and Game Theory (cs.GT); Human-Computer Interaction (cs.HC)
Abstract: A principal designs an algorithm that generates a publicly observable prediction of a binary state. She must decide whether to act directly based on the prediction or to delegate the decision to an agent with private information but potential misalignment. We study the optimal design of the prediction algorithm and the delegation rule in such environments. Three key findings emerge: (1) Delegation is optimal if and only if the principal would make the same binary decision as the agent had she observed the agent's information. (2) Providing the most informative algorithm may be suboptimal even if the principal can act on the algorithm's prediction. Instead, the optimal algorithm may provide more information about one state and restrict information about the other. (3) Well-intentioned policies aiming to provide more information, such as keeping a "human-in-the-loop" or requiring maximal prediction accuracy, could strictly worsen decision quality compared to systems with no human or no algorithmic assistance. These findings predict the underperformance of human-machine collaborations if no measures are taken to mitigate common preference misalignment between algorithms and human decision-makers.
- [1507] arXiv:2402.09392 (cross-list from cs.MM) [ pdf , ps , other ]
-
Title: LL-GABR: Energy Efficient Live Video Streaming Using Reinforcement LearningComments: 10 pages, 3 figures, 3 TablesSubjects: Multimedia (cs.MM) ; Artificial Intelligence (cs.AI)
Abstract: Over the recent years, research and development in adaptive bitrate (ABR) algorithms for live video streaming have been successful in improving users' quality of experience (QoE) by reducing latency to near real-time levels while delivering higher bitrate videos with minimal rebuffering time. However, the QoE models used by these ABR algorithms do not take into account that a large portion of live video streaming clients use mobile devices where a higher bitrate does not necessarily translate into higher perceived quality. Ignoring perceived quality results in playing videos at higher bitrates without a significant increase in perceptual video quality and becomes a burden for battery-constrained mobile devices due to higher energy consumption. In this paper, we propose LL-GABR, a deep reinforcement learning approach that models the QoE using perceived video quality instead of bitrate and uses energy consumption along with other metrics like latency, rebuffering events, and smoothness. LL-GABR makes no assumptions about the underlying video, environment, or network settings and can operate flexibly on different video titles, each having a different bitrate encoding ladder without additional re-training, unlike existing learning-based ABRs. Trace-driven experimental results show that LL-GABR outperforms the state-of-the-art approaches by up to 44% in terms of perceptual QoE and a 73% increase in energy efficiency as a result of reducing net energy consumption by 11%.
- [1508] arXiv:2402.09398 (cross-list from cs.LG) [ pdf , ps , other ]
-
Title: Get More with LESS: Synthesizing Recurrence with KV Cache Compression for Efficient LLM InferenceSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Abstract: Many computational factors limit broader deployment of large language models. In this paper, we focus on a memory bottleneck imposed by the key-value (KV) cache, a computational shortcut that requires storing previous KV pairs during decoding. While existing KV cache methods approach this problem by pruning or evicting large swaths of relatively less important KV pairs to dramatically reduce the memory footprint of the cache, they can have limited success in tasks that require recollecting a majority of previous tokens. To alleviate this issue, we propose LESS, a simple integration of a (nearly free) constant sized cache with eviction-based cache methods, such that all tokens can be queried at later decoding steps. Its ability to retain information throughout time shows merit on a variety of tasks where we demonstrate LESS can help reduce the performance gap from caching everything, sometimes even matching it, all while being efficient.
- [1509] arXiv:2402.09401 (cross-list from cs.LG) [ pdf , ps , other ]
-
Title: Reinforcement Learning from Human Feedback with Active QueriesComments: 28 pages, 1 figure, 4 tableSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Optimization and Control (math.OC); Machine Learning (stat.ML)
Abstract: Aligning large language models (LLM) with human preference plays a key role in building modern generative models and can be achieved by reinforcement learning from human feedback (RLHF). Despite their superior performance, current RLHF approaches often require a large amount of human-labelled preference data, which is expensive to collect. In this paper, inspired by the success of active learning, we address this problem by proposing query-efficient RLHF methods. We first formalize the alignment problem as a contextual dueling bandit problem and design an active-query-based proximal policy optimization (APPO) algorithm with an $\tilde{O}(d^2/\Delta)$ regret bound and an $\tilde{O}(d^2/\Delta^2)$ query complexity, where $d$ is the dimension of feature space and $\Delta$ is the sub-optimality gap over all the contexts. We then propose ADPO, a practical version of our algorithm based on direct preference optimization (DPO) and apply it to fine-tuning LLMs. Our experiments show that ADPO, while only making about half of queries for human preference, matches the performance of the state-of-the-art DPO method.
- [1510] arXiv:2402.09404 (cross-list from cs.CL) [ pdf , ps , other ]
-
Title: AQA-Bench: An Interactive Benchmark for Evaluating LLMs' Sequential Reasoning AbilitySubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: This paper introduces AQA-Bench, a novel benchmark to assess the sequential reasoning capabilities of large language models (LLMs) in algorithmic contexts, such as depth-first search (DFS). The key feature of our evaluation benchmark lies in its interactive evaluation protocol -- for example, in DFS, the availability of each node's connected edge is contingent upon the model's traversal to that node, thereby necessitating the LLM's ability to effectively remember visited nodes and strategize subsequent moves. We comprehensively build AQA-Bench with three different algorithms, namely binary search, depth-first search, and breadth-first search, and to evaluate the sequential reasoning ability of 12 different LLMs. Our investigations reveal several interesting findings: (1) Closed-source models like GPT-4 and Gemini generally show strong sequential reasoning ability, significantly outperforming open-source LLMs. (2) Naively providing interactive examples may inadvertently hurt few-shot performance. (3) A very limited number of predecessor steps following the optimal policy can substantially boost small models' performance. (4) The scaling correlation between performance and model size is not always significant, sometimes even showcasing an inverse trend. We hope our study can catalyze future work on advancing the understanding and enhancement of LLMs' capabilities in sequential reasoning. The code is available at this https URL .
- [1511] arXiv:2402.09427 (cross-list from eess.SP) [ pdf , ps , html , other ]
-
Title: DoorINet: A Deep-Learning Inertial Framework for Door-Mounted IoT ApplicationsComments: 10 pages, 14 figures, 4 tablesSubjects: Signal Processing (eess.SP) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: Many Internet of Things applications utilize low-cost, micro, electro-mechanical inertial sensors. A common task is orientation estimation. To tackle such a task, attitude and heading reference system algorithms are applied. Relying on the gyroscope readings, the accelerometer readings are used to update the attitude angles, and magnetometer measurements are utilized to update the heading angle. In indoor environments, magnetometers suffer from interference that degrades their performance. This mainly influences applications focused on estimating the heading angle like finding the heading angle of a closet or fridge door. To circumvent such situations, we propose DoorINet, an end-to-end deep-learning framework to calculate the heading angle from door-mounted, low-cost inertial sensors without using magnetometers. To evaluate our approach, we record a unique dataset containing 391 minutes of accelerometer and gyroscope measurements and corresponding ground-truth heading angle. We show that our proposed approach outperforms commonly used, model based approaches and data-driven methods.
- [1512] arXiv:2402.09430 (cross-list from eess.SP) [ pdf , ps , html , other ]
-
Title: WiMANS: A Benchmark Dataset for WiFi-based Multi-user Activity SensingShuokang Huang , Kaihan Li , Di You , Yichong Chen , Arvin Lin , Siying Liu , Xiaohui Li , Julie A. McCannComments: We present WiMANS, to our knowledge, the first dataset for multi-user activity sensing based on WiFiSubjects: Signal Processing (eess.SP) ; Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM)
Abstract: WiFi-based human sensing has exhibited remarkable potential to analyze user behaviors in a non-intrusive and device-free manner, benefiting applications as diverse as smart homes and healthcare. However, most previous works focus on single-user sensing, which has limited practicability in scenarios involving multiple users. Although recent studies have begun to investigate WiFi-based multi-user sensing, there remains a lack of benchmark datasets to facilitate reproducible and comparable research. To bridge this gap, we present WiMANS, to our knowledge, the first dataset for multi-user sensing based on WiFi. WiMANS contains over 9.4 hours of dual-band WiFi Channel State Information (CSI), as well as synchronized videos, monitoring simultaneous activities of multiple users. We exploit WiMANS to benchmark the performance of state-of-the-art WiFi-based human sensing models and video-based models, posing new challenges and opportunities for future work. We believe WiMANS can push the boundaries of current studies and catalyze the research on WiFi-based multi-user sensing.
- [1513] arXiv:2402.09433 (cross-list from eess.SP) [ pdf , ps , html , other ]
-
Title: Electrical Behavior Association Mining for Household ShortTerm Energy Consumption ForecastingComments: 3 figures and 4 tables; This manuscript is submitted for possible publicationSubjects: Signal Processing (eess.SP) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Systems and Control (eess.SY)
Abstract: Accurate household short-term energy consumption forecasting (STECF) is crucial for home energy management, but it is technically challenging, due to highly random behaviors of individual residential users. To improve the accuracy of STECF on a day-ahead scale, this paper proposes an novel STECF methodology that leverages association mining in electrical behaviors. First, a probabilistic association quantifying and discovering method is proposed to model the pairwise behaviors association and generate associated clusters. Then, a convolutional neural network-gated recurrent unit (CNN-GRU) based forecasting is provided to explore the temporal correlation and enhance accuracy. The testing results demonstrate that this methodology yields a significant enhancement in the STECF.
- [1514] arXiv:2402.09442 (cross-list from eess.SP) [ pdf , ps , other ]
-
Title: Progress in artificial intelligence applications based on the combination of self-driven sensors and deep learningComments: This aticle was accepted by ieee conferenceSubjects: Signal Processing (eess.SP) ; Artificial Intelligence (cs.AI)
Abstract: In the era of Internet of Things, how to develop a smart sensor system with sustainable power supply, easy deployment and flexible use has become a difficult problem to be solved. The traditional power supply has problems such as frequent replacement or charging when in use, which limits the development of wearable devices. The contact-to-separate friction nanogenerator (TENG) was prepared by using polychotomy thy lene (PTFE) and aluminum (AI) foils. Human motion energy was collected by human body arrangement, and human motion posture was monitored according to the changes of output electrical signals. In 2012, Academician Wang Zhong lin and his team invented the triboelectric nanogenerator (TENG), which uses Maxwell displacement current as a driving force to directly convert mechanical stimuli into electrical signals, so it can be used as a self-driven sensor. Teng-based sensors have the advantages of simple structure and high instantaneous power density, which provides an important means for building intelligent sensor systems. At the same time, machine learning, as a technology with low cost, short development cycle, strong data processing ability and prediction ability, has a significant effect on the processing of a large number of electrical signals generated by TENG, and the combination with TENG sensors will promote the rapid development of intelligent sensor networks in the future. Therefore, this paper is based on the intelligent sound monitoring and recognition system of TENG, which has good sound recognition capability, and aims to evaluate the feasibility of the sound perception module architecture in ubiquitous sensor networks.
- [1515] arXiv:2402.09443 (cross-list from eess.SP) [ pdf , ps , other ]
-
Title: Review of algorithms for predicting fatigue using EEGComments: arXiv admin note: text overlap with arXiv:2401.15766Subjects: Signal Processing (eess.SP) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: Fatigue detection is of paramount importance in enhancing safety, productivity, and well-being across diverse domains, including transportation, healthcare, and industry. This scientific paper presents a comprehensive investigation into the application of machine learning algorithms for the detection of physiological fatigue using Electroencephalogram (EEG) signals. The primary objective of this study was to assess the efficacy of various algorithms in predicting an individual's level of fatigue based on EEG data.
- [1516] arXiv:2402.09444 (cross-list from eess.SP) [ pdf , ps , html , other ]
-
Title: Multimodal Action Quality AssessmentComments: IEEE Transactions on Image Processing 2024Subjects: Signal Processing (eess.SP) ; Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
Abstract: Action quality assessment (AQA) is to assess how well an action is performed. Previous works perform modelling by only the use of visual information, ignoring audio information. We argue that although AQA is highly dependent on visual information, the audio is useful complementary information for improving the score regression accuracy, especially for sports with background music, such as figure skating and rhythmic gymnastics. To leverage multimodal information for AQA, i.e., RGB, optical flow and audio information, we propose a Progressive Adaptive Multimodal Fusion Network (PAMFN) that separately models modality-specific information and mixed-modality information. Our model consists of with three modality-specific branches that independently explore modality-specific information and a mixed-modality branch that progressively aggregates the modality-specific information from the modality-specific branches. To build the bridge between modality-specific branches and the mixed-modality branch, three novel modules are proposed. First, a Modality-specific Feature Decoder module is designed to selectively transfer modality-specific information to the mixed-modality branch. Second, when exploring the interaction between modality-specific information, we argue that using an invariant multimodal fusion policy may lead to suboptimal results, so as to take the potential diversity in different parts of an action into consideration. Therefore, an Adaptive Fusion Module is proposed to learn adaptive multimodal fusion policies in different parts of an action. This module consists of several FusionNets for exploring different multimodal fusion strategies and a PolicyNet for deciding which FusionNets are enabled. Third, a module called Cross-modal Feature Decoder is designed to transfer cross-modal features generated by Adaptive Fusion Module to the mixed-modality branch.
- [1517] arXiv:2402.09445 (cross-list from eess.SP) [ pdf , ps , html , other ]
-
Title: iMove: Exploring Bio-impedance Sensing for Fitness Activity RecognitionComments: Accepted by percom2024Subjects: Signal Processing (eess.SP) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Robotics (cs.RO)
Abstract: Automatic and precise fitness activity recognition can be beneficial in aspects from promoting a healthy lifestyle to personalized preventative healthcare. While IMUs are currently the prominent fitness tracking modality, through iMove, we show bio-impedence can help improve IMU-based fitness tracking through sensor fusion and contrastive this http URL evaluate our methods, we conducted an experiment including six upper body fitness activities performed by ten subjects over five days to collect synchronized data from bio-impedance across two wrists and IMU on the left wrist.The contrastive learning framework uses the two modalities to train a better IMU-only classification model, where bio-impedance is only required at the training phase, by which the average Macro F1 score with the input of a single IMU was improved by 3.22 \% reaching 84.71 \% compared to the 81.49 \% of the IMU baseline model. We have also shown how bio-impedance can improve human activity recognition (HAR) directly through sensor fusion, reaching an average Macro F1 score of 89.57 \% (two modalities required for both training and inference) even if Bio-impedance alone has an average macro F1 score of 75.36 \%, which is outperformed by IMU alone. In addition, similar results were obtained in an extended study on lower body fitness activity classification, demonstrating the generalisability of our approach.Our findings underscore the potential of sensor fusion and contrastive learning as valuable tools for advancing fitness activity recognition, with bio-impedance playing a pivotal role in augmenting the capabilities of IMU-based systems.
- [1518] arXiv:2402.09447 (cross-list from eess.SP) [ pdf , ps , html , other ]
-
Title: Wavelet Analysis of Noninvasive EEG Signals Discriminates Complex and Natural Grasp TypesSubjects: Signal Processing (eess.SP) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Neurons and Cognition (q-bio.NC)
Abstract: This research aims to decode hand grasps from Electroencephalograms (EEGs) for dexterous neuroprosthetic development and Brain-Computer Interface (BCI) applications, especially for patients with motor disorders. Particularly, it focuses on distinguishing two complex natural power and precision grasps in addition to a neutral condition as a no-movement condition using a new EEG-based BCI platform and wavelet signal processing. Wavelet analysis involved generating time-frequency and topographic maps from wavelet power coefficients. Then, by using machine learning techniques with novel wavelet features, we achieved high average accuracies: 85.16% for multiclass, 95.37% for No-Movement vs Power, 95.40% for No-Movement vs Precision, and 88.07% for Power vs Precision, demonstrating the effectiveness of these features in EEG-based grasp differentiation. In contrast to previous studies, a critical part of our study was permutation feature importance analysis, which highlighted key features for grasp classification. It revealed that the most crucial brain activities during grasping occur in the motor cortex, within the alpha and beta frequency bands. These insights demonstrate the potential of wavelet features in real-time neuroprosthetic technology and BCI applications.
- [1519] arXiv:2402.09448 (cross-list from eess.SP) [ pdf , ps , html , other ]
-
Title: A Comparative Study of Conventional and Tripolar EEG for High-Performance Reach-to-Grasp BCI SystemsSubjects: Signal Processing (eess.SP) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: This study aims to enhance BCI applications for individuals with motor impairments by comparing the effectiveness of tripolar EEG (tEEG) with conventional EEG. The focus is on interpreting and decoding various grasping movements, such as power grasp and precision grasp. The goal is to determine which EEG technology is more effective in processing and translating grasp related neural signals. The approach involved experimenting on ten healthy participants who performed two distinct grasp movements: power grasp and precision grasp, with a no movement condition serving as the baseline. Our research presents a thorough comparison between EEG and tEEG in decoding grasping movements. This comparison spans several key parameters, including signal to noise ratio (SNR), spatial resolution via functional connectivity, ERPs, and wavelet time frequency analysis. Additionally, our study involved extracting and analyzing statistical features from the wavelet coefficients, and both binary and multiclass classification methods were employed. Four machine learning algorithms were used to evaluate the decoding accuracies. Our results indicated that tEEG demonstrated superior performance over conventional EEG in various aspects. This included a higher signal to noise ratio, enhanced spatial resolution, and more informative data in ERPs and wavelet time frequency analysis. The use of tEEG led to notable improvements in decoding accuracy for differentiating movement types. Specifically, tEEG achieved around 90% accuracy in binary and 75.97% for multiclass classification. These results are markedly better than those from standard EEG, which recorded a maximum of 77.85% and 61.27% in similar tasks, respectively. These findings highlight the superior effectiveness of tEEG over EEG in decoding grasp types and its competitive or superior performance in complex classifications compared with existing research.
- [1520] arXiv:2402.09450 (cross-list from eess.SP) [ pdf , ps , html , other ]
-
Title: Guiding Masked Representation Learning to Capture Spatio-Temporal Relationship of ElectrocardiogramComments: ICLR 2024. The first three authors contribute equallySubjects: Signal Processing (eess.SP) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: Electrocardiograms (ECG) are widely employed as a diagnostic tool for monitoring electrical signals originating from a heart. Recent machine learning research efforts have focused on the application of screening various diseases using ECG signals. However, adapting to the application of screening disease is challenging in that labeled ECG data are limited. Achieving general representation through self-supervised learning (SSL) is a well-known approach to overcome the scarcity of labeled data; however, a naive application of SSL to ECG data, without considering the spatial-temporal relationships inherent in ECG signals, may yield suboptimal results. In this paper, we introduce ST-MEM (Spatio-Temporal Masked Electrocardiogram Modeling), designed to learn spatio-temporal features by reconstructing masked 12-lead ECG data. ST-MEM outperforms other SSL baseline methods in various experimental settings for arrhythmia classification tasks. Moreover, we demonstrate that ST-MEM is adaptable to various lead combinations. Through quantitative and qualitative analysis, we show a spatio-temporal relationship within ECG data. Our code is available at this https URL .
- [1521] arXiv:2402.09453 (cross-list from eess.SP) [ pdf , ps , other ]
-
Title: Improving EEG Signal Classification Accuracy Using Wasserstein Generative Adversarial NetworksComments: 11 pages, 2 tables, 3 figuresSubjects: Signal Processing (eess.SP) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: Electroencephalography (EEG) plays a vital role in recording brain activities and is integral to the development of brain-computer interface (BCI) technologies. However, the limited availability and high variability of EEG signals present substantial challenges in creating reliable BCIs. To address this issue, we propose a practical solution drawing on the latest developments in deep learning and Wasserstein Generative Adversarial Network (WGAN). The WGAN was trained on the BCI2000 dataset, consisting of around 1500 EEG recordings and 64 channels from 45 individuals. The generated EEG signals were evaluated via three classifiers yielding improved average accuracies. The quality of generated signals measured using Frechet Inception Distance (FID) yielded scores of 1.345 and 11.565 for eyes-open and closed respectively. Even without a spectral or spatial loss term, our WGAN model was able to emulate the spectral and spatial properties of the EEG training data. The WGAN-generated data mirrored the dominant alpha activity during closed-eye resting and high delta waves in the training data in its topographic map and power spectral density (PSD) plot. Our research testifies to the potential of WGANs in addressing the limited EEG data issue for BCI development by enhancing a small dataset to improve classifier generalizability.
- [1522] arXiv:2402.09456 (cross-list from cs.LG) [ pdf , ps , html , other ]
-
Title: Optimistic Thompson Sampling for No-Regret Learning in Unknown GamesSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Computer Science and Game Theory (cs.GT); Machine Learning (stat.ML)
Abstract: This work tackles the complexities of multi-player scenarios in \emph{unknown games}, where the primary challenge lies in navigating the uncertainty of the environment through bandit feedback alongside strategic decision-making. We introduce Thompson Sampling (TS)-based algorithms that exploit the information of opponents' actions and reward structures, leading to a substantial reduction in experimental budgets -- achieving over tenfold improvements compared to conventional approaches. Notably, our algorithms demonstrate that, given specific reward structures, the regret bound depends logarithmically on the total action space, significantly alleviating the curse of multi-player. Furthermore, we unveil the \emph{Optimism-then-NoRegret} (OTN) framework, a pioneering methodology that seamlessly incorporates our advancements with established algorithms, showcasing its utility in practical scenarios such as traffic routing and radar sensing in the real world.
- [1523] arXiv:2402.09465 (cross-list from eess.SP) [ pdf , ps , html , other ]
-
Title: RLEEGNet: Integrating Brain-Computer Interfaces with Adaptive AI for Intuitive Responsiveness and High-Accuracy Motor Imagery ClassificationComments: 23 pages, 1 figure, 6 tablesSubjects: Signal Processing (eess.SP) ; Artificial Intelligence (cs.AI)
Abstract: Current approaches to prosthetic control are limited by their reliance on traditional methods, which lack real-time adaptability and intuitive responsiveness. These limitations are particularly pronounced in assistive technologies designed for individuals with diverse cognitive states and motor intentions. In this paper, we introduce a framework that leverages Reinforcement Learning (RL) with Deep Q-Networks (DQN) for classification tasks. Additionally, we present a preprocessing technique using the Common Spatial Pattern (CSP) for multiclass motor imagery (MI) classification in a One-Versus-The-Rest (OVR) manner. The subsequent 'csp space' transformation retains the temporal dimension of EEG signals, crucial for extracting discriminative features. The integration of DQN with a 1D-CNN-LSTM architecture optimizes the decision-making process in real-time, thereby enhancing the system's adaptability to the user's evolving needs and intentions. We elaborate on the data processing methods for two EEG motor imagery datasets. Our innovative model, RLEEGNet, incorporates a 1D-CNN-LSTM architecture as the Online Q-Network within the DQN, facilitating continuous adaptation and optimization of control strategies through feedback. This mechanism allows the system to learn optimal actions through trial and error, progressively improving its performance. RLEEGNet demonstrates high accuracy in classifying MI-EEG signals, achieving as high as 100% accuracy in MI tasks across both the GigaScience (3-class) and BCI-IV-2a (4-class) datasets. These results highlight the potential of combining DQN with a 1D-CNN-LSTM architecture to significantly enhance the adaptability and responsiveness of BCI systems.
- [1524] arXiv:2402.09474 (cross-list from eess.SP) [ pdf , ps , html , other ]
-
Title: Deciphering Heartbeat Signatures: A Vision Transformer Approach to Explainable Atrial Fibrillation Detection from ECG SignalsComments: Accepted for publication at the 46th Annual International Conference of the IEEE Engineering in Medicine and Biology Society, IEEE EMBC 2024Subjects: Signal Processing (eess.SP) ; Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Abstract: Remote patient monitoring based on wearable single-lead electrocardiogram (ECG) devices has significant potential for enabling the early detection of heart disease, especially in combination with artificial intelligence (AI) approaches for automated heart disease detection. There have been prior studies applying AI approaches based on deep learning for heart disease detection. However, these models are yet to be widely accepted as a reliable aid for clinical diagnostics, in part due to the current black-box perception surrounding many AI algorithms. In particular, there is a need to identify the key features of the ECG signal that contribute toward making an accurate diagnosis, thereby enhancing the interpretability of the model. In the present study, we develop a vision transformer approach to identify atrial fibrillation based on single-lead ECG data. A residual network (ResNet) approach is also developed for comparison with the vision transformer approach. These models are applied to the Chapman-Shaoxing dataset to classify atrial fibrillation, as well as another common arrhythmia, sinus bradycardia, and normal sinus rhythm heartbeats. The models enable the identification of the key regions of the heartbeat that determine the resulting classification, and highlight the importance of P-waves and T-waves, as well as heartbeat duration and signal amplitude, in distinguishing normal sinus rhythm from atrial fibrillation and sinus bradycardia.
- [1525] arXiv:2402.09476 (cross-list from q-bio.QM) [ pdf , ps , html , other ]
-
Title: AI-Enabled Lung Cancer PrognosisMahtab Darvish , Ryan Trask , Patrick Tallon , Mélina Khansari , Lei Ren , Michelle Hershman , Bardia YousefiComments: This is the author's version of a book chapter entitled: "Cancer Research: An Interdisciplinary Approach", SpringerJournal-ref: Springer book chapter "Cancer Research: An Interdisciplinary Approach" 2024Subjects: Quantitative Methods (q-bio.QM) ; Artificial Intelligence (cs.AI); Image and Video Processing (eess.IV)
Abstract: Lung cancer is the primary cause of cancer-related mortality, claiming approximately 1.79 million lives globally in 2020, with an estimated 2.21 million new cases diagnosed within the same period. Among these, Non-Small Cell Lung Cancer (NSCLC) is the predominant subtype, characterized by a notably bleak prognosis and low overall survival rate of approximately 25% over five years across all disease stages. However, survival outcomes vary considerably based on the stage at diagnosis and the therapeutic interventions administered. Recent advancements in artificial intelligence (AI) have revolutionized the landscape of lung cancer prognosis. AI-driven methodologies, including machine learning and deep learning algorithms, have shown promise in enhancing survival prediction accuracy by efficiently analyzing complex multi-omics data and integrating diverse clinical variables. By leveraging AI techniques, clinicians can harness comprehensive prognostic insights to tailor personalized treatment strategies, ultimately improving patient outcomes in NSCLC. Overviewing AI-driven data processing can significantly help bolster the understanding and provide better directions for using such systems.
- [1526] arXiv:2402.09494 (cross-list from cs.HC) [ pdf , ps , other ]
-
Title: Can AI and humans genuinely communicate?Comments: March 2024 preprintSubjects: Human-Computer Interaction (cs.HC) ; Artificial Intelligence (cs.AI)
Abstract: Can AI and humans genuinely communicate? In this article, after giving some background and motivating my proposal (sections 1 to 3), I explore a way to answer this question that I call the "mental-behavioral methodology" (sections 4 and 5). This methodology follows the following three steps: First, spell out what mental capacities are sufficient for human communication (as opposed to communication more generally). Second, spell out the experimental paradigms required to test whether a behavior exhibits these capacities. Third, apply or adapt these paradigms to test whether an AI displays the relevant behaviors. If the first two steps are successfully completed, and if the AI passes the tests with human-like results, this constitutes evidence that this AI and humans can genuinely communicate. This mental-behavioral methodology has the advantage that we don't need to understand the workings of black-box algorithms, such as standard deep neural networks. This is comparable to the fact that we don't need to understand how human brains work to know that humans can genuinely communicate. This methodology also has its disadvantages and I will discuss some of them (section 6).
- [1527] arXiv:2402.09495 (cross-list from q-fin.RM) [ pdf , ps , html , other ]
-
Title: On the Potential of Network-Based Features for Fraud DetectionSubjects: Risk Management (q-fin.RM) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: Online transaction fraud presents substantial challenges to businesses and consumers, risking significant financial losses. Conventional rule-based systems struggle to keep pace with evolving fraud tactics, leading to high false positive rates and missed detections. Machine learning techniques offer a promising solution by leveraging historical data to identify fraudulent patterns. This article explores using the personalised PageRank (PPR) algorithm to capture the social dynamics of fraud by analysing relationships between financial accounts. The primary objective is to compare the performance of traditional features with the addition of PPR in fraud detection models. Results indicate that integrating PPR enhances the model's predictive power, surpassing the baseline model. Additionally, the PPR feature provides unique and valuable information, evidenced by its high feature importance score. Feature stability analysis confirms consistent feature distributions across training and test datasets.
- [1528] arXiv:2402.09497 (cross-list from cs.CR) [ pdf , ps , other ]
-
Title: Instruction Tuning for Secure Code GenerationSubjects: Cryptography and Security (cs.CR) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Software Engineering (cs.SE)
Abstract: Modern language models (LMs) have gained widespread acceptance in everyday and professional contexts, particularly in programming. An essential procedure enabling this adoption is instruction tuning, which substantially enhances LMs' practical utility by training them to follow user instructions and human preferences. However, existing instruction tuning schemes overlook a crucial aspect: the security of generated code. As a result, even the state-of-the-art instruction-tuned LMs frequently produce unsafe code, posing significant security risks. In this work, we introduce SafeCoder to address this gap. SafeCoder performs security-centric fine-tuning using a diverse and high-quality dataset that we collected using an automated pipeline. We integrate the security fine-tuning with standard instruction tuning, to facilitate a joint optimization of both security and utility. Despite its simplicity, we show that SafeCoder is effective across a variety of popular LMs and datasets. It is able to drastically improve security (by about 30%), while preserving utility.
- [1529] arXiv:2402.09508 (cross-list from cs.SD) [ pdf , ps , html , other ]
-
Title: Arrange, Inpaint, and Refine: Steerable Long-term Music Audio Generation and Editing via Content-based ControlsSubjects: Sound (cs.SD) ; Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
Abstract: Controllable music generation plays a vital role in human-AI music co-creation. While Large Language Models (LLMs) have shown promise in generating high-quality music, their focus on autoregressive generation limits their utility in music editing tasks. To bridge this gap, we introduce a novel Parameter-Efficient Fine-Tuning (PEFT) method. This approach enables autoregressive language models to seamlessly address music inpainting tasks. Additionally, our PEFT method integrates frame-level content-based controls, facilitating track-conditioned music refinement and score-conditioned music arrangement. We apply this method to fine-tune MusicGen, a leading autoregressive music generation model. Our experiments demonstrate promising results across multiple music editing tasks, offering more flexible controls for future AI-driven music editing tools. A demo page\footnote{\url{ this https URL }.} showcasing our work and source codes\footnote{\url{ this https URL }.} are available online.
- [1530] arXiv:2402.09540 (cross-list from cs.CR) [ pdf , ps , other ]
-
Title: Why Does Differential Privacy with Large Epsilon Defend Against Practical Membership Inference Attacks?Comments: Accepted at PPAI-24: AAAI Workshop on Privacy-Preserving Artificial IntelligenceSubjects: Cryptography and Security (cs.CR) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: For small privacy parameter $\epsilon$, $\epsilon$-differential privacy (DP) provides a strong worst-case guarantee that no membership inference attack (MIA) can succeed at determining whether a person's data was used to train a machine learning model. The guarantee of DP is worst-case because: a) it holds even if the attacker already knows the records of all but one person in the data set; and b) it holds uniformly over all data sets. In practical applications, such a worst-case guarantee may be overkill: practical attackers may lack exact knowledge of (nearly all of) the private data, and our data set might be easier to defend, in some sense, than the worst-case data set. Such considerations have motivated the industrial deployment of DP models with large privacy parameter (e.g. $\epsilon \geq 7$), and it has been observed empirically that DP with large $\epsilon$ can successfully defend against state-of-the-art MIAs. Existing DP theory cannot explain these empirical findings: e.g., the theoretical privacy guarantees of $\epsilon \geq 7$ are essentially vacuous. In this paper, we aim to close this gap between theory and practice and understand why a large DP parameter can prevent practical MIAs. To tackle this problem, we propose a new privacy notion called practical membership privacy (PMP). PMP models a practical attacker's uncertainty about the contents of the private data. The PMP parameter has a natural interpretation in terms of the success rate of a practical MIA on a given data set. We quantitatively analyze the PMP parameter of two fundamental DP mechanisms: the exponential mechanism and Gaussian mechanism. Our analysis reveals that a large DP parameter often translates into a much smaller PMP parameter, which guarantees strong privacy against practical MIAs. Using our findings, we offer principled guidance for practitioners in choosing the DP parameter.
- [1531] arXiv:2402.09546 (cross-list from cs.RO) [ pdf , ps , html , other ]
-
Title: How Secure Are Large Language Models (LLMs) for Navigation in Urban Environments?Subjects: Robotics (cs.RO) ; Artificial Intelligence (cs.AI)
Abstract: In the field of robotics and automation, navigation systems based on Large Language Models (LLMs) have recently shown impressive performance. However, the security aspects of these systems have received relatively less attention. This paper pioneers the exploration of vulnerabilities in LLM-based navigation models in urban outdoor environments, a critical area given the technology's widespread application in autonomous driving, logistics, and emergency services. Specifically, we introduce a novel Navigational Prompt Suffix (NPS) Attack that manipulates LLM-based navigation models by appending gradient-derived suffixes to the original navigational prompt, leading to incorrect actions. We conducted comprehensive experiments on an LLMs-based navigation model that employs various LLMs for reasoning. Our results, derived from the Touchdown and Map2Seq street-view datasets under both few-shot learning and fine-tuning configurations, demonstrate notable performance declines across three metrics in the face of both white-box and black-box attacks. These results highlight the generalizability and transferability of the NPS Attack, emphasizing the need for enhanced security in LLM-based navigation systems. As an initial countermeasure, we propose the Navigational Prompt Engineering (NPE) Defense strategy, concentrating on navigation-relevant keywords to reduce the impact of adversarial suffixes. While initial findings indicate that this strategy enhances navigational safety, there remains a critical need for the wider research community to develop stronger defense methods to effectively tackle the real-world challenges faced by these systems.
- [1532] arXiv:2402.09579 (cross-list from cs.HC) [ pdf , ps , other ]
-
Title: Advancing Building Energy Modeling with Large Language Models: Exploration and Case StudiesSubjects: Human-Computer Interaction (cs.HC) ; Artificial Intelligence (cs.AI)
Abstract: The rapid progression in artificial intelligence has facilitated the emergence of large language models like ChatGPT, offering potential applications extending into specialized engineering modeling, especially physics-based building energy modeling. This paper investigates the innovative integration of large language models with building energy modeling software, focusing specifically on the fusion of ChatGPT with EnergyPlus. A literature review is first conducted to reveal a growing trend of incorporating of large language models in engineering modeling, albeit limited research on their application in building energy modeling. We underscore the potential of large language models in addressing building energy modeling challenges and outline potential applications including 1) simulation input generation, 2) simulation output analysis and visualization, 3) conducting error analysis, 4) co-simulation, 5) simulation knowledge extraction and training, and 6) simulation optimization. Three case studies reveal the transformative potential of large language models in automating and optimizing building energy modeling tasks, underscoring the pivotal role of artificial intelligence in advancing sustainable building practices and energy efficiency. The case studies demonstrate that selecting the right large language model techniques is essential to enhance performance and reduce engineering efforts. Besides direct use of large language models, three specific techniques were utilized: 1) prompt engineering, 2) retrieval-augmented generation, and 3) multi-agent large language models. The findings advocate a multidisciplinary approach in future artificial intelligence research, with implications extending beyond building energy modeling to other specialized engineering modeling.
- [1533] arXiv:2402.09581 (cross-list from cs.CR) [ pdf , ps , other ]
-
Title: Combatting deepfakes: Policies to address national security threats and rights violationsSubjects: Cryptography and Security (cs.CR) ; Artificial Intelligence (cs.AI); Computers and Society (cs.CY)
Abstract: This paper provides policy recommendations to address threats from deepfakes. First, we provide background information about deepfakes and review the harms they pose. We describe how deepfakes are currently used to proliferate sexual abuse material, commit fraud, manipulate voter behavior, and pose threats to national security. Second, we review previous legislative proposals designed to address deepfakes. Third, we present a comprehensive policy proposal that focuses on addressing multiple parts of the deepfake supply chain. The deepfake supply chain begins with a small number of model developers, model providers, and compute providers, and it expands to include billions of potential deepfake creators. We describe this supply chain in greater detail and describe how entities at each step of the supply chain ought to take reasonable measures to prevent the creation and proliferation of deepfakes. Finally, we address potential counterpoints of our proposal. Overall, deepfakes will present increasingly severe threats to global security and individual liberties. To address these threats, we call on policymakers to enact legislation that addresses multiple parts of the deepfake supply chain.
- [1534] arXiv:2402.09603 (cross-list from cs.LG) [ pdf , ps , html , other ]
-
Title: Scalable Graph Self-Supervised LearningSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Abstract: In regularization Self-Supervised Learning (SSL) methods for graphs, computational complexity increases with the number of nodes in graphs and embedding dimensions. To mitigate the scalability of non-contrastive graph SSL, we propose a novel approach to reduce the cost of computing the covariance matrix for the pre-training loss function with volume-maximization terms. Our work focuses on reducing the cost associated with the loss computation via graph node or dimension sampling. We provide theoretical insight into why dimension sampling would result in accurate loss computations and support it with mathematical derivation of the novel approach. We develop our experimental setup on the node-level graph prediction tasks, where SSL pre-training has shown to be difficult due to the large size of real world graphs. Our experiments demonstrate that the cost associated with the loss computation can be reduced via node or dimension sampling without lowering the downstream performance. Our results demonstrate that sampling mostly results in improved downstream performance. Ablation studies and experimental analysis are provided to untangle the role of the different factors in the experimental setup.
- [1535] arXiv:2402.09604 (cross-list from cs.CV) [ pdf , ps , other ]
-
Title: Medical Image Segmentation with InTEnt: Integrated Entropy Weighting for Single Image Test-Time AdaptationComments: Code and pre-trained weights: this https URLSubjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI)
Abstract: Test-time adaptation (TTA) refers to adapting a trained model to a new domain during testing. Existing TTA techniques rely on having multiple test images from the same domain, yet this may be impractical in real-world applications such as medical imaging, where data acquisition is expensive and imaging conditions vary frequently. Here, we approach such a task, of adapting a medical image segmentation model with only a single unlabeled test image. Most TTA approaches, which directly minimize the entropy of predictions, fail to improve performance significantly in this setting, in which we also observe the choice of batch normalization (BN) layer statistics to be a highly important yet unstable factor due to only having a single test domain example. To overcome this, we propose to instead integrate over predictions made with various estimates of target domain statistics between the training and test statistics, weighted based on their entropy statistics. Our method, validated on 24 source/target domain splits across 3 medical image datasets surpasses the leading method by 2.9% Dice coefficient on average.
- [1536] arXiv:2402.09609 (cross-list from cs.CL) [ pdf , ps , html , other ]
-
Title: LogicPrpBank: A Corpus for Logical Implication and EquivalenceComments: In the 5th AI4ED Workshop, held in conjunction with The 38th AAAI Conference on Artificial Intelligence, February 2024Subjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI)
Abstract: Logic reasoning has been critically needed in problem-solving and decision-making. Although Language Models (LMs) have demonstrated capabilities of handling multiple reasoning tasks (e.g., commonsense reasoning), their ability to reason complex mathematical problems, specifically propositional logic, remains largely underexplored. This lack of exploration can be attributed to the limited availability of annotated corpora. Here, we present a well-labeled propositional logic corpus, LogicPrpBank, containing 7093 Propositional Logic Statements (PLSs) across six mathematical subjects, to study a brand-new task of reasoning logical implication and equivalence. We benchmark LogicPrpBank with widely-used LMs to show that our corpus offers a useful resource for this challenging task and there is ample room for model improvement.
- [1537] arXiv:2402.09611 (cross-list from cs.CL) [ pdf , ps , html , other ]
-
Title: Towards Privacy-Aware Sign Language Translation at ScaleSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Abstract: A major impediment to the advancement of sign language translation (SLT) is data scarcity. Much of the sign language data currently available on the web cannot be used for training supervised models due to the lack of aligned captions. Furthermore, scaling SLT using large-scale web-scraped datasets bears privacy risks due to the presence of biometric information, which the responsible development of SLT technologies should account for. In this work, we propose a two-stage framework for privacy-aware SLT at scale that addresses both of these issues. We introduce SSVP-SLT, which leverages self-supervised video pretraining on anonymized and unannotated videos, followed by supervised SLT finetuning on a curated parallel dataset. SSVP-SLT achieves state-of-the-art finetuned and zero-shot gloss-free SLT performance on the How2Sign dataset, outperforming the strongest respective baselines by over 3 BLEU-4. Based on controlled experiments, we further discuss the advantages and limitations of self-supervised pretraining and anonymization via facial obfuscation for SLT.
- [1538] arXiv:2402.09614 (cross-list from cs.CL) [ pdf , ps , html , other ]
-
Title: Probabilistic Reasoning in Generative Large Language ModelsSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI)
Abstract: This paper considers the challenges that Large Language Models (LLMs) face when reasoning over text that includes information involving uncertainty explicitly quantified via probability values. This type of reasoning is relevant to a variety of contexts ranging from everyday conversations to medical decision-making. Despite improvements in the mathematical reasoning capabilities of LLMs, they still exhibit significant difficulties when it comes to probabilistic reasoning. To deal with this problem, we first introduce the Bayesian Linguistic Inference Dataset (BLInD), a new dataset specifically designed to test the probabilistic reasoning capabilities of LLMs. We then leverage this new dataset to thoroughly illustrate the specific limitations of LLMs for tasks involving probabilistic reasoning and present several strategies that map the problem to different formal representations, including Python code, probabilistic inference algorithms, and probabilistic logical programming. We conclude by providing an evaluation of our methods on BLInD and on an adaptation of a causal reasoning question-answering dataset, which further shows their practical effectiveness.
- [1539] arXiv:2402.09615 (cross-list from cs.CL) [ pdf , ps , other ]
-
Title: API Pack: A Massive Multilingual Dataset for API Call GenerationSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: We introduce API Pack, a multilingual dataset featuring over one million instruction-API call pairs aimed at advancing large language models' API call generation capabilities. Through experiments, we demonstrate API Pack's efficacy in enhancing models for this specialized task while maintaining their overall proficiency at general coding. Fine-tuning CodeLlama-13B on just 20,000 Python instances yields over 10% and 5% higher accuracy than GPT-3.5 and GPT-4 respectively in generating unseen API calls. Scaling to 100k examples improves generalization to new APIs not seen during training. In addition, cross-lingual API call generation is achieved without needing extensive data per language. The dataset, fine-tuned models, and overall code base are publicly available at this https URL .
- [1540] arXiv:2402.09649 (cross-list from cs.CE) [ pdf , ps , html , other ]
-
Title: ProtChatGPT: Towards Understanding Proteins with Large Language ModelsSubjects: Computational Engineering, Finance, and Science (cs.CE) ; Artificial Intelligence (cs.AI); Biomolecules (q-bio.BM)
Abstract: Protein research is crucial in various fundamental disciplines, but understanding their intricate structure-function relationships remains challenging. Recent Large Language Models (LLMs) have made significant strides in comprehending task-specific knowledge, suggesting the potential for ChatGPT-like systems specialized in protein to facilitate basic research. In this work, we introduce ProtChatGPT, which aims at learning and understanding protein structures via natural languages. ProtChatGPT enables users to upload proteins, ask questions, and engage in interactive conversations to produce comprehensive answers. The system comprises protein encoders, a Protein-Language Pertaining Transformer (PLP-former), a projection adapter, and an LLM. The protein first undergoes protein encoders and PLP-former to produce protein embeddings, which are then projected by the adapter to conform with the LLM. The LLM finally combines user questions with projected embeddings to generate informative answers. Experiments show that ProtChatGPT can produce promising responses to proteins and their corresponding questions. We hope that ProtChatGPT could form the basis for further exploration and application in protein research. Code and our pre-trained model will be publicly available.
- [1541] arXiv:2402.09664 (cross-list from cs.SE) [ pdf , ps , html , other ]
-
Title: CodeMind: A Framework to Challenge Large Language Models for Code ReasoningSubjects: Software Engineering (cs.SE) ; Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Programming Languages (cs.PL)
Abstract: Solely relying on test passing to evaluate Large Language Models (LLMs) for code synthesis may result in unfair assessment or promoting models with data leakage. As an alternative, we introduce CodeMind, a framework designed to gauge the code reasoning abilities of LLMs. CodeMind currently supports three code reasoning tasks: Independent Execution Reasoning (IER), Dependent Execution Reasoning (DER), and Specification Reasoning (SR). The first two evaluate models to predict the execution output of an arbitrary code or code the model could correctly synthesize. The third one evaluates the extent to which LLMs implement the specified expected behavior.
Our extensive evaluation of nine LLMs across five benchmarks in two different programming languages using CodeMind shows that LLMs fairly follow control flow constructs and, in general, explain how inputs evolve to output, specifically for simple programs and the ones they can correctly synthesize. However, their performance drops for code with higher complexity, non-trivial logical and arithmetic operators, non-primitive types, and API calls. Furthermore, we observe that, while correlated, specification reasoning (essential for code synthesis) does not imply execution reasoning (essential for broader programming tasks such as testing and debugging): ranking LLMs based on test passing can be different compared to code reasoning. - [1542] arXiv:2402.09668 (cross-list from cs.LG) [ pdf , ps , html , other ]
-
Title: How to Train Data-Efficient LLMsNoveen Sachdeva , Benjamin Coleman , Wang-Cheng Kang , Jianmo Ni , Lichan Hong , Ed H. Chi , James Caverlee , Julian McAuley , Derek Zhiyuan ChengComments: Under review. 44 pages, 30 figuresSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
Abstract: The training of large language models (LLMs) is expensive. In this paper, we study data-efficient approaches for pre-training LLMs, i.e., techniques that aim to optimize the Pareto frontier of model quality and training resource/data consumption. We seek to understand the tradeoffs associated with data selection routines based on (i) expensive-to-compute data-quality estimates, and (ii) maximization of coverage and diversity-based measures in the feature space. Our first technique, Ask-LLM, leverages the zero-shot reasoning capabilities of instruction-tuned LLMs to directly assess the quality of a training example. To target coverage, we propose Density sampling, which models the data distribution to select a diverse sample. In our comparison of 19 samplers, involving hundreds of evaluation tasks and pre-training runs, we find that Ask-LLM and Density are the best methods in their respective categories. Coverage sampling can recover the performance of the full data, while models trained on Ask-LLM data consistently outperform full-data training -- even when we reject 90% of the original dataset, while converging up to 70% faster.
- [1543] arXiv:2402.09674 (cross-list from cs.CL) [ pdf , ps , html , other ]
-
Title: PAL: Proxy-Guided Black-Box Attack on Large Language ModelsSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI); Cryptography and Security (cs.CR); Machine Learning (cs.LG)
Abstract: Large Language Models (LLMs) have surged in popularity in recent months, but they have demonstrated concerning capabilities to generate harmful content when manipulated. While techniques like safety fine-tuning aim to minimize harmful use, recent works have shown that LLMs remain vulnerable to attacks that elicit toxic responses. In this work, we introduce the Proxy-Guided Attack on LLMs (PAL), the first optimization-based attack on LLMs in a black-box query-only setting. In particular, it relies on a surrogate model to guide the optimization and a sophisticated loss designed for real-world LLM APIs. Our attack achieves 84% attack success rate (ASR) on GPT-3.5-Turbo and 48% on Llama-2-7B, compared to 4% for the current state of the art. We also propose GCG++, an improvement to the GCG attack that reaches 94% ASR on white-box Llama-2-7B, and the Random-Search Attack on LLMs (RAL), a strong but simple baseline for query-based attacks. We believe the techniques proposed in this work will enable more comprehensive safety testing of LLMs and, in the long term, the development of better security guardrails. The code can be found at this https URL .
- [1544] arXiv:2402.09683 (cross-list from cs.HC) [ pdf , ps , html , other ]
-
Title: Exploring a Behavioral Model of "Positive Friction" in Human-AI InteractionComments: This preprint has not undergone peer review or any post-submission corrections. The Version of Record of this contribution will be published in Springer Nature Computer Science book series in Volume HCI International 2024Journal-ref: DESIGN, USER EXPERIENCE AND USABILITY. HCII 2024Subjects: Human-Computer Interaction (cs.HC) ; Artificial Intelligence (cs.AI); Computers and Society (cs.CY)
Abstract: Designing seamless, frictionless user experiences has long been a dominant trend in both applied behavioral science and artificial intelligence (AI), in which the goal of making desirable actions easy and efficient informs efforts to minimize friction in user experiences. However, in some settings, friction can be genuinely beneficial, such as the insertion of deliberate delays to increase reflection, preventing individuals from resorting to automatic or biased behaviors, and enhancing opportunities for unexpected discoveries. More recently, the popularization and availability of AI on a widespread scale has only increased the need to examine how friction can help or hinder users of AI; it also suggests a need to consider how positive friction can benefit AI practitioners, both during development processes (e.g., working with diverse teams) and to inform how AI is designed into offerings. This paper first proposes a "positive friction" model that can help characterize how friction is currently beneficial in user and developer experiences with AI, diagnose the potential need for friction where it may not yet exist in these contexts, and inform how positive friction can be used to generate solutions, especially as advances in AI continue to be progress and new opportunities emerge. It then explores this model in the context of AI users and developers by proposing the value of taking a hybrid "AI+human" lens, and concludes by suggesting questions for further exploration.
- [1545] arXiv:2402.09695 (cross-list from cs.LG) [ pdf , ps , html , other ]
-
Title: Reward Poisoning Attack Against Offline Reinforcement LearningSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Cryptography and Security (cs.CR)
Abstract: We study the problem of reward poisoning attacks against general offline reinforcement learning with deep neural networks for function approximation. We consider a black-box threat model where the attacker is completely oblivious to the learning algorithm and its budget is limited by constraining both the amount of corruption at each data point, and the total perturbation. We propose an attack strategy called `policy contrast attack'. The high-level idea is to make some low-performing policies appear as high-performing while making high-performing policies appear as low-performing. To the best of our knowledge, we propose the first black-box reward poisoning attack in the general offline RL setting. We provide theoretical insights on the attack design and empirically show that our attack is efficient against current state-of-the-art offline RL algorithms in different kinds of learning datasets.
- [1546] arXiv:2402.09712 (cross-list from cs.CV) [ pdf , ps , other ]
-
Title: Diffusion Model with Cross Attention as an Inductive Bias for DisentanglementSubjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI)
Abstract: Disentangled representation learning strives to extract the intrinsic factors within observed data. Factorizing these representations in an unsupervised manner is notably challenging and usually requires tailored loss functions or specific structural designs. In this paper, we introduce a new perspective and framework, demonstrating that diffusion models with cross-attention can serve as a powerful inductive bias to facilitate the learning of disentangled representations. We propose to encode an image to a set of concept tokens and treat them as the condition of the latent diffusion for image reconstruction, where cross-attention over the concept tokens is used to bridge the interaction between the encoder and diffusion. Without any additional regularization, this framework achieves superior disentanglement performance on the benchmark datasets, surpassing all previous methods with intricate designs. We have conducted comprehensive ablation studies and visualization analysis, shedding light on the functioning of this model. This is the first work to reveal the potent disentanglement capability of diffusion models with cross-attention, requiring no complex designs. We anticipate that our findings will inspire more investigation on exploring diffusion for disentangled representation learning towards more sophisticated data analysis and understanding.
- [1547] arXiv:2402.09721 (cross-list from cs.GT) [ pdf , ps , html , other ]
-
Title: Persuading a Learning AgentSubjects: Computer Science and Game Theory (cs.GT) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Theoretical Economics (econ.TH)
Abstract: We study a repeated Bayesian persuasion problem (and more generally, any generalized principal-agent problem with complete information) where the principal does not have commitment power and the agent uses algorithms to learn to respond to the principal's signals. We reduce this problem to a one-shot generalized principal-agent problem with an approximately-best-responding agent. This reduction allows us to show that: if the agent uses contextual no-regret learning algorithms, then the principal can guarantee a utility that is arbitrarily close to the principal's optimal utility in the classic non-learning model with commitment; if the agent uses contextual no-swap-regret learning algorithms, then the principal cannot obtain any utility significantly more than the optimal utility in the non-learning model with commitment. The difference between the principal's obtainable utility in the learning model and the non-learning model is bounded by the agent's regret (swap-regret). If the agent uses mean-based learning algorithms (which can be no-regret but not no-swap-regret), then the principal can do significantly better than the non-learning model. These conclusions hold not only for Bayesian persuasion, but also for any generalized principal-agent problem with complete information, including Stackelberg games and contract design.
- [1548] arXiv:2402.09722 (cross-list from cs.RO) [ pdf , ps , other ]
-
Title: Reg-NF: Efficient Registration of Implicit Surfaces within Neural FieldsComments: Accepted to ICRA 2024. The first two authors contributed equallySubjects: Robotics (cs.RO) ; Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
Abstract: Neural fields, coordinate-based neural networks, have recently gained popularity for implicitly representing a scene. In contrast to classical methods that are based on explicit representations such as point clouds, neural fields provide a continuous scene representation able to represent 3D geometry and appearance in a way which is compact and ideal for robotics applications. However, limited prior methods have investigated registering multiple neural fields by directly utilising these continuous implicit representations. In this paper, we present Reg-NF, a neural fields-based registration that optimises for the relative 6-DoF transformation between two arbitrary neural fields, even if those two fields have different scale factors. Key components of Reg-NF include a bidirectional registration loss, multi-view surface sampling, and utilisation of volumetric signed distance functions (SDFs). We showcase our approach on a new neural field dataset for evaluating registration problems. We provide an exhaustive set of experiments and ablation studies to identify the performance of our approach, while also discussing limitations to provide future direction to the research community on open challenges in utilizing neural fields in unconstrained environments.
- [1549] arXiv:2402.09723 (cross-list from stat.ML) [ pdf , ps , html , other ]
-
Title: Best Arm Identification for Prompt Learning under a Limited BudgetSubjects: Machine Learning (stat.ML) ; Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG)
Abstract: The remarkable instruction-following capability of large language models (LLMs) has sparked a growing interest in automatically learning suitable prompts. However, while many effective methods have been proposed, the cost incurred during the learning process (e.g., accessing LLM and evaluating the responses) has not been considered. To overcome this limitation, this work explicitly incorporates a finite budget constraint into prompt learning. Towards developing principled solutions, a novel connection is established between prompt learning and fixed-budget best arm identification (BAI-FB) in multi-armed bandits (MAB). Based on this connection, a general framework TRIPLE (besT aRm Identification for Prompt LEarning) is proposed to harness the power of BAI-FB in prompt learning systematically. Unique characteristics of prompt learning further lead to two embedding-based enhancements of TRIPLE by exploiting the ideas of clustering and function approximation. Extensive experiments on multiple well-adopted tasks using both GPT 3.5 and Llama2 demonstrate the significant performance improvement of TRIPLE over the previous baselines while satisfying the limited budget constraints.
- [1550] arXiv:2402.09725 (cross-list from cs.CL) [ pdf , ps , other ]
-
Title: Improving Non-autoregressive Machine Translation with Error Exposure and Consistency RegularizationSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI)
Abstract: Being one of the IR-NAT (Iterative-refinemennt-based NAT) frameworks, the Conditional Masked Language Model (CMLM) adopts the mask-predict paradigm to re-predict the masked low-confidence tokens. However, CMLM suffers from the data distribution discrepancy between training and inference, where the observed tokens are generated differently in the two cases. In this paper, we address this problem with the training approaches of error exposure and consistency regularization (EECR). We construct the mixed sequences based on model prediction during training, and propose to optimize over the masked tokens under imperfect observation conditions. We also design a consistency learning method to constrain the data distribution for the masked tokens under different observing situations to narrow down the gap between training and inference. The experiments on five translation benchmarks obtains an average improvement of 0.68 and 0.40 BLEU scores compared to the base models, respectively, and our CMLMC-EECR achieves the best performance with a comparable translation quality with the Transformer. The experiments results demonstrate the effectiveness of our method.
- [1551] arXiv:2402.09727 (cross-list from cs.CL) [ pdf , ps , other ]
-
Title: A Human-Inspired Reading Agent with Gist Memory of Very Long ContextsComments: Website: this https URLSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI); Information Retrieval (cs.IR)
Abstract: Current Large Language Models (LLMs) are not only limited to some maximum context length, but also are not able to robustly consume long inputs. To address these limitations, we propose ReadAgent, an LLM agent system that increases effective context length up to 20x in our experiments. Inspired by how humans interactively read long documents, we implement ReadAgent as a simple prompting system that uses the advanced language capabilities of LLMs to (1) decide what content to store together in a memory episode, (2) compress those memory episodes into short episodic memories called gist memories, and (3) take actions to look up passages in the original text if ReadAgent needs to remind itself of relevant details to complete a task. We evaluate ReadAgent against baselines using retrieval methods, using the original long contexts, and using the gist memories. These evaluations are performed on three long-document reading comprehension tasks: QuALITY, NarrativeQA, and QMSum. ReadAgent outperforms the baselines on all three tasks while extending the effective context window by 3-20x.
- [1552] arXiv:2402.09728 (cross-list from cs.CR) [ pdf , ps , other ]
-
Title: AbuseGPT: Abuse of Generative AI ChatBots to Create Smishing CampaignsComments: 6 pages, 12 figures, published in ISDFS 2024Subjects: Cryptography and Security (cs.CR) ; Artificial Intelligence (cs.AI)
Abstract: SMS phishing, also known as "smishing", is a growing threat that tricks users into disclosing private information or clicking into URLs with malicious content through fraudulent mobile text messages. In recent past, we have also observed a rapid advancement of conversational generative AI chatbot services (e.g., OpenAI's ChatGPT, Google's BARD), which are powered by pre-trained large language models (LLMs). These AI chatbots certainly have a lot of utilities but it is not systematically understood how they can play a role in creating threats and attacks. In this paper, we propose AbuseGPT method to show how the existing generative AI-based chatbot services can be exploited by attackers in real world to create smishing texts and eventually lead to craftier smishing campaigns. To the best of our knowledge, there is no pre-existing work that evidently shows the impacts of these generative text-based models on creating SMS phishing. Thus, we believe this study is the first of its kind to shed light on this emerging cybersecurity threat. We have found strong empirical evidences to show that attackers can exploit ethical standards in the existing generative AI-based chatbot services by crafting prompt injection attacks to create newer smishing campaigns. We also discuss some future research directions and guidelines to protect the abuse of generative AI-based services and safeguard users from smishing attacks.
- [1553] arXiv:2402.09746 (cross-list from q-fin.CP) [ pdf , ps , other ]
-
Title: Alpha-GPT 2.0: Human-in-the-Loop AI for Quantitative InvestmentSubjects: Computational Finance (q-fin.CP) ; Artificial Intelligence (cs.AI)
Abstract: Recently, we introduced a new paradigm for alpha mining in the realm of quantitative investment, developing a new interactive alpha mining system framework, Alpha-GPT. This system is centered on iterative Human-AI interaction based on large language models, introducing a Human-in-the-Loop approach to alpha discovery. In this paper, we present the next-generation Alpha-GPT 2.0 \footnote{Draft. Work in progress}, a quantitative investment framework that further encompasses crucial modeling and analysis phases in quantitative investment. This framework emphasizes the iterative, interactive research between humans and AI, embodying a Human-in-the-Loop strategy throughout the entire quantitative investment pipeline. By assimilating the insights of human researchers into the systematic alpha research process, we effectively leverage the Human-in-the-Loop approach, enhancing the efficiency and precision of quantitative investment research.
- [1554] arXiv:2402.09748 (cross-list from cs.CL) [ pdf , ps , html , other ]
-
Title: Model Compression and Efficient Inference for Large Language Models: A SurveyWenxiao Wang , Wei Chen , Yicong Luo , Yongliu Long , Zhengkai Lin , Liye Zhang , Binbin Lin , Deng Cai , Xiaofei HeComments: 47 pages, review 380 papers. The work is ongoingSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Performance (cs.PF)
Abstract: Transformer based large language models have achieved tremendous success. However, the significant memory and computational costs incurred during the inference process make it challenging to deploy large models on resource-constrained devices. In this paper, we investigate compression and efficient inference methods for large language models from an algorithmic perspective. Regarding taxonomy, similar to smaller models, compression and acceleration algorithms for large language models can still be categorized into quantization, pruning, distillation, compact architecture design, dynamic networks. However, Large language models have two prominent characteristics compared to smaller models: (1) Most of compression algorithms require finetuning or even retraining the model after compression. The most notable aspect of large models is the very high cost associated with model finetuning or training. Therefore, many algorithms for large models, such as quantization and pruning, start to explore tuning-free algorithms. (2) Large models emphasize versatility and generalization rather than performance on a single task. Hence, many algorithms, such as knowledge distillation, focus on how to preserving their versatility and generalization after compression. Since these two characteristics were not very pronounced in early large models, we further distinguish large language models into medium models and ``real'' large models. Additionally, we also provide an introduction to some mature frameworks for efficient inference of large models, which can support basic compression or acceleration algorithms, greatly facilitating model deployment for users.
- [1555] arXiv:2402.09750 (cross-list from cs.HC) [ pdf , ps , html , other ]
-
Title: Exploring the Potential of Large Language Models in Artistic Creation: Collaboration and Reflection on Creative ProgrammingComments: 15 pages, 4 figuresSubjects: Human-Computer Interaction (cs.HC) ; Artificial Intelligence (cs.AI)
Abstract: Recently, the potential of large language models (LLMs) has been widely used in assisting programming. However, current research does not explore the artist potential of LLMs in creative coding within artist and AI collaboration. Our work probes the reflection type of artists in the creation process with such collaboration. We compare two common collaboration approaches: invoking the entire program and multiple subtasks. Our findings exhibit artists' different stimulated reflections in two different methods. Our finding also shows the correlation of reflection type with user performance, user satisfaction, and subjective experience in two collaborations through conducting two methods, including experimental data and qualitative interviews. In this sense, our work reveals the artistic potential of LLM in creative coding. Meanwhile, we provide a critical lens of human-AI collaboration from the artists' perspective and expound design suggestions for future work of AI-assisted creative tasks.
- [1556] arXiv:2402.09759 (cross-list from cs.CL) [ pdf , ps , other ]
-
Title: Efficient Language Adaptive Pre-training: Extending State-of-the-Art Large Language Models for PolishComments: 10 pagesSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI)
Abstract: This study explores the potential of fine-tuning foundational English Large Language Models (LLMs) for generating Polish text. The first step involves Language Adaptive Pre-training (LAPT) on a high-quality dataset of 3.11 GB, consisting of 276 million Polish tokens. The LAPT is followed by additional fine-tuning aimed at solving nine KLEJ challenges. Our trained model Curie-7B-v1 not only generates Polish text with the lowest perplexity of 3.02 among decoder-based Polish models but also closely rivals the performance of the best Polish encoder-decoder models with a less than 2% gap on 8 out of 9 tasks. Curie-7B-v1 used approximately 2-3% of a typical dataset size to learn Polish. The LAPT was completed in less than five days using a consumer GPU, highlighting the method's efficiency. The proficiency of the model in Polish was significantly enhanced, demonstrating the viability of this approach for adding new languages to existing LLMs by training just 1.2% of its parameters. To contribute to the community's collaborative progress, the model has been released as open-source.
- [1557] arXiv:2402.09760 (cross-list from cs.CL) [ pdf , ps , other ]
-
Title: Grounding Language Model with Chunking-Free In-Context RetrievalSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI); Information Retrieval (cs.IR)
Abstract: This paper presents a novel Chunking-Free In-Context (CFIC) retrieval approach, specifically tailored for Retrieval-Augmented Generation (RAG) systems. Traditional RAG systems often struggle with grounding responses using precise evidence text due to the challenges of processing lengthy documents and filtering out irrelevant content. Commonly employed solutions, such as document chunking and adapting language models to handle longer contexts, have their limitations. These methods either disrupt the semantic coherence of the text or fail to effectively address the issues of noise and inaccuracy in evidence retrieval.
CFIC addresses these challenges by circumventing the conventional chunking process. It utilizes the encoded hidden states of documents for in-context retrieval, employing auto-aggressive decoding to accurately identify the specific evidence text required for user queries, eliminating the need for chunking. CFIC is further enhanced by incorporating two decoding strategies, namely Constrained Sentence Prefix Decoding and Skip Decoding. These strategies not only improve the efficiency of the retrieval process but also ensure that the fidelity of the generated grounding text evidence is maintained. Our evaluations of CFIC on a range of open QA datasets demonstrate its superiority in retrieving relevant and accurate evidence, offering a significant improvement over traditional methods. By doing away with the need for document chunking, CFIC presents a more streamlined, effective, and efficient retrieval solution, making it a valuable advancement in the field of RAG systems. - [1558] arXiv:2402.09766 (cross-list from cs.IR) [ pdf , ps , other ]
-
Title: From Variability to Stability: Advancing RecSys Benchmarking PracticesValeriy Shevchenko , Nikita Belousov , Alexey Vasilev , Vladimir Zholobov , Artyom Sosedka , Natalia Semenova , Anna Volodkevich , Andrey Savchenko , Alexey ZaytsevComments: 8 pages with 11 figuresSubjects: Information Retrieval (cs.IR) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: In the rapidly evolving domain of Recommender Systems (RecSys), new algorithms frequently claim state-of-the-art performance based on evaluations over a limited set of arbitrarily selected datasets. However, this approach may fail to holistically reflect their effectiveness due to the significant impact of dataset characteristics on algorithm performance. Addressing this deficiency, this paper introduces a novel benchmarking methodology to facilitate a fair and robust comparison of RecSys algorithms, thereby advancing evaluation practices. By utilizing a diverse set of $30$ open datasets, including two introduced in this work, and evaluating $11$ collaborative filtering algorithms across $9$ metrics, we critically examine the influence of dataset characteristics on algorithm performance. We further investigate the feasibility of aggregating outcomes from multiple datasets into a unified ranking. Through rigorous experimental analysis, we validate the reliability of our methodology under the variability of datasets, offering a benchmarking strategy that balances quality and computational demands. This methodology enables a fair yet effective means of evaluating RecSys algorithms, providing valuable guidance for future research endeavors.
- [1559] arXiv:2402.09782 (cross-list from cs.LG) [ pdf , ps , html , other ]
-
Title: MC-DBN: A Deep Belief Network-Based Model for Modality CompletionJournal-ref: International Conference on Computer Supported Cooperative Work in Design 2024Subjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Abstract: Recent advancements in multi-modal artificial intelligence (AI) have revolutionized the fields of stock market forecasting and heart rate monitoring. Utilizing diverse data sources can substantially improve prediction accuracy. Nonetheless, additional data may not always align with the original dataset. Interpolation methods are commonly utilized for handling missing values in modal data, though they may exhibit limitations in the context of sparse information. Addressing this challenge, we propose a Modality Completion Deep Belief Network-Based Model (MC-DBN). This approach utilizes implicit features of complete data to compensate for gaps between itself and additional incomplete data. It ensures that the enhanced multi-modal data closely aligns with the dynamic nature of the real world to enhance the effectiveness of the model. We conduct evaluations of the MC-DBN model in two datasets from the stock market forecasting and heart rate monitoring domains. Comprehensive experiments showcase the model's capacity to bridge the semantic divide present in multi-modal data, subsequently enhancing its performance. The source code is available at: this https URL
- [1560] arXiv:2402.09784 (cross-list from cs.IR) [ pdf , ps , other ]
-
Title: Sequential Recommendation on Temporal Proximities with Contrastive Learning and Self-AttentionComments: 10 pages, 9 figuresSubjects: Information Retrieval (cs.IR) ; Artificial Intelligence (cs.AI)
Abstract: Sequential recommender systems identify user preferences from their past interactions to predict subsequent items optimally. Although traditional deep-learning-based models and modern transformer-based models in previous studies capture unidirectional and bidirectional patterns within user-item interactions, the importance of temporal contexts, such as individual behavioral and societal trend patterns, remains underexplored. Notably, recent models often neglect similarities in users' actions that occur implicitly among users during analogous timeframes-a concept we term vertical temporal proximity. These models primarily adapt the self-attention mechanisms of the transformer to consider the temporal context in individual user actions. Meanwhile, this adaptation still remains limited in considering the horizontal temporal proximity within item interactions, like distinguishing between subsequent item purchases within a week versus a month. To address these gaps, we propose a sequential recommendation model called TemProxRec, which includes contrastive learning and self-attention methods to consider temporal proximities both across and within user-item interactions. The proposed contrastive learning method learns representations of items selected in close temporal periods across different users to be close. Simultaneously, the proposed self-attention mechanism encodes temporal and positional contexts in a user sequence using both absolute and relative embeddings. This way, our TemProxRec accurately predicts the relevant items based on the user-item interactions within a specific timeframe. We validate this work through comprehensive experiments on TemProxRec, consistently outperforming existing models on benchmark datasets as well as showing the significance of considering the vertical and horizontal temporal proximities into sequential recommendation.
- [1561] arXiv:2402.09786 (cross-list from cs.CV) [ pdf , ps , html , other ]
-
Title: Examining Pathological Bias in a Generative Adversarial Network Discriminator: A Case Study on a StyleGAN3 ModelAlvin Grissom II , Ryan F. Lei , Matt Gusdorff , Jeova Farias Sales Rocha Neto , Bailey Lin , Ryan TrotterSubjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI); Computers and Society (cs.CY); Machine Learning (cs.LG)
Abstract: Generative adversarial networks (GANs) generate photorealistic faces that are often indistinguishable by humans from real faces. While biases in machine learning models are often assumed to be due to biases in training data, we find pathological internal color and luminance biases in the discriminator of a pre-trained StyleGAN3-r model that are not explicable by the training data. We also find that the discriminator systematically stratifies scores by both image- and face-level qualities and that this disproportionately affects images across gender, race, and other categories. We examine axes common in research on stereotyping in social psychology.
- [1562] arXiv:2402.09792 (cross-list from cs.NE) [ pdf , ps , other ]
-
Title: System-level Impact of Non-Ideal Program-Time of Charge Trap Flash (CTF) on Deep Neural NetworkSubjects: Neural and Evolutionary Computing (cs.NE) ; Artificial Intelligence (cs.AI); Emerging Technologies (cs.ET); Image and Video Processing (eess.IV)
Abstract: Learning of deep neural networks (DNN) using Resistive Processing Unit (RPU) architecture is energy-efficient as it utilizes dedicated neuromorphic hardware and stochastic computation of weight updates for in-memory computing. Charge Trap Flash (CTF) devices can implement RPU-based weight updates in DNNs. However, prior work has shown that the weight updates (V_T) in CTF-based RPU are impacted by the non-ideal program time of CTF. The non-ideal program time is affected by two factors of CTF. Firstly, the effects of the number of input pulses (N) or pulse width (pw), and secondly, the gap between successive update pulses (t_gap) used for the stochastic computation of weight updates. Therefore, the impact of this non-ideal program time must be studied for neural network training simulations. In this study, Firstly, we propose a pulse-train design compensation technique to reduce the total error caused by non-ideal program time of CTF and stochastic variance of a network. Secondly, we simulate RPU-based DNN with non-ideal program time of CTF on MNIST and Fashion-MNIST datasets. We find that for larger N (~1000), learning performance approaches the ideal (software-level) training level and, therefore, is not much impacted by the choice of t_gap used to implement RPU-based weight updates. However, for lower N (<500), learning performance depends on T_gap of the pulses. Finally, we also performed an ablation study to isolate the causal factor of the improved learning performance. We conclude that the lower noise level in the weight updates is the most likely significant factor to improve the learning performance of DNN. Thus, our study attempts to compensate for the error caused by non-ideal program time and standardize the pulse length (N) and pulse gap (t_gap) specifications for CTF-based RPUs for accurate system-level on-chip training.
- [1563] arXiv:2402.09795 (cross-list from cs.CR) [ pdf , ps , other ]
-
Title: An advanced data fabric architecture leveraging homomorphic encryption and federated learningSakib Anwar Rieyan , Md. Raisul Kabir News , A.B.M. Muntasir Rahman , Sadia Afrin Khan , Sultan Tasneem Jawad Zaarif , Md. Golam Rabiul Alam , Mohammad Mehedi Hassan , Michele Ianni , Giancarlo FortinoJournal-ref: Information Fusion, 102, 102004 (2024)Subjects: Cryptography and Security (cs.CR) ; Artificial Intelligence (cs.AI); Databases (cs.DB)
Abstract: Data fabric is an automated and AI-driven data fusion approach to accomplish data management unification without moving data to a centralized location for solving complex data problems. In a Federated learning architecture, the global model is trained based on the learned parameters of several local models that eliminate the necessity of moving data to a centralized repository for machine learning. This paper introduces a secure approach for medical image analysis using federated learning and partially homomorphic encryption within a distributed data fabric architecture. With this method, multiple parties can collaborate in training a machine-learning model without exchanging raw data but using the learned or fused features. The approach complies with laws and regulations such as HIPAA and GDPR, ensuring the privacy and security of the data. The study demonstrates the method's effectiveness through a case study on pituitary tumor classification, achieving a significant level of accuracy. However, the primary focus of the study is on the development and evaluation of federated learning and partially homomorphic encryption as tools for secure medical image analysis. The results highlight the potential of these techniques to be applied to other privacy-sensitive domains and contribute to the growing body of research on secure and privacy-preserving machine learning.
- [1564] arXiv:2402.09820 (cross-list from cs.CR) [ pdf , ps , other ]
-
Title: Utilizing Deep Learning for Enhancing Network Resilience in FinanceSubjects: Cryptography and Security (cs.CR) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG); General Finance (q-fin.GN)
Abstract: In the age of the Internet, people's lives are increasingly dependent on today's network technology. Maintaining network integrity and protecting the legitimate interests of users is at the heart of network construction. Threat detection is an important part of a complete and effective defense system. How to effectively detect unknown threats is one of the concerns of network protection. Currently, network threat detection is usually based on rules and traditional machine learning methods, which create artificial rules or extract common spatiotemporal features, which cannot be applied to large-scale data applications, and the emergence of unknown risks causes the detection accuracy of the original model to decline. With this in mind, this paper uses deep learning for advanced threat detection to improve protective measures in the financial industry. Many network researchers have shifted their focus to exception-based intrusion detection techniques. The detection technology mainly uses statistical machine learning methods - collecting normal program and network behavior data, extracting multidimensional features, and training decision machine learning models on this basis (commonly used include naive Bayes, decision trees, support vector machines, random forests, etc.).
- [1565] arXiv:2402.09830 (cross-list from cs.LG) [ pdf , ps , other ]
-
Title: Utilizing GANs for Fraud Detection: Model Training with Synthetic Transaction DataSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Computational Engineering, Finance, and Science (cs.CE)
Abstract: Anomaly detection is a critical challenge across various research domains, aiming to identify instances that deviate from normal data distributions. This paper explores the application of Generative Adversarial Networks (GANs) in fraud detection, comparing their advantages with traditional methods. GANs, a type of Artificial Neural Network (ANN), have shown promise in modeling complex data distributions, making them effective tools for anomaly detection. The paper systematically describes the principles of GANs and their derivative models, emphasizing their application in fraud detection across different datasets. And by building a collection of adversarial verification graphs, we will effectively prevent fraud caused by bots or automated systems and ensure that the users in the transaction are real. The objective of the experiment is to design and implement a fake face verification code and fraud detection system based on Generative Adversarial network (GANs) algorithm to enhance the security of the transaction process.The study demonstrates the potential of GANs in enhancing transaction security through deep learning techniques.
- [1566] arXiv:2402.09867 (cross-list from eess.SP) [ pdf , ps , other ]
-
Title: Characterizing Accuracy Trade-offs of EEG Applications on Embedded HMPsComments: 7 pages, 10 figuresSubjects: Signal Processing (eess.SP) ; Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Performance (cs.PF)
Abstract: Electroencephalography (EEG) recordings are analyzed using battery-powered wearable devices to monitor brain activities and neurological disorders. These applications require long and continuous processing to generate feasible results. However, wearable devices are constrained with limited energy and computation resources, owing to their small sizes for practical use cases. Embedded heterogeneous multi-core platforms (HMPs) can provide better performance within limited energy budgets for EEG applications. Error resilience of the EEG application pipeline can be exploited further to maximize the performance and energy gains with HMPs. However, disciplined tuning of approximation on embedded HMPs requires a thorough exploration of the accuracy-performance-power trade-off space. In this work, we characterize the error resilience of three EEG applications, including Epileptic Seizure Detection, Sleep Stage Classification, and Stress Detection on the real-world embedded HMP test-bed of the Odroid XU3 platform. We present a combinatorial evaluation of power-performance-accuracy trade-offs of EEG applications at different approximation, power, and performance levels to provide insights into the disciplined tuning of approximation in EEG applications on embedded platforms.
- [1567] arXiv:2402.09871 (cross-list from cs.SD) [ pdf , ps , html , other ]
-
Title: MuChin: A Chinese Colloquial Description Benchmark for Evaluating Language Models in the Field of MusicZihao Wang , Shuyu Li , Tao Zhang , Qi Wang , Pengfei Yu , Jinyang Luo , Yan Liu , Ming Xi , Kejun ZhangComments: Accepted by International Joint Conference on Artificial Intelligence 2024 (IJCAI 2024)Subjects: Sound (cs.SD) ; Artificial Intelligence (cs.AI); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)
Abstract: The rapidly evolving multimodal Large Language Models (LLMs) urgently require new benchmarks to uniformly evaluate their performance on understanding and textually describing music. However, due to semantic gaps between Music Information Retrieval (MIR) algorithms and human understanding, discrepancies between professionals and the public, and low precision of annotations, existing music description datasets cannot serve as benchmarks. To this end, we present MuChin, the first open-source music description benchmark in Chinese colloquial language, designed to evaluate the performance of multimodal LLMs in understanding and describing music. We established the Caichong Music Annotation Platform (CaiMAP) that employs an innovative multi-person, multi-stage assurance method, and recruited both amateurs and professionals to ensure the precision of annotations and alignment with popular semantics. Utilizing this method, we built a dataset with multi-dimensional, high-precision music annotations, the Caichong Music Dataset (CaiMD), and carefully selected 1,000 high-quality entries to serve as the test set for MuChin. Based on MuChin, we analyzed the discrepancies between professionals and amateurs in terms of music description, and empirically demonstrated the effectiveness of annotated data for fine-tuning LLMs. Ultimately, we employed MuChin to evaluate existing music understanding models on their ability to provide colloquial descriptions of music. All data related to the benchmark, along with the scoring code and detailed appendices, have been open-sourced ( this https URL ).
- [1568] arXiv:2402.09883 (cross-list from cs.CV) [ pdf , ps , other ]
-
Title: Lester: rotoscope animation through video object segmentation and trackingSubjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI); Graphics (cs.GR); Multimedia (cs.MM)
Abstract: This article introduces Lester, a novel method to automatically synthetise retro-style 2D animations from videos. The method approaches the challenge mainly as an object segmentation and tracking problem. Video frames are processed with the Segment Anything Model (SAM) and the resulting masks are tracked through subsequent frames with DeAOT, a method of hierarchical propagation for semi-supervised video object segmentation. The geometry of the masks' contours is simplified with the Douglas-Peucker algorithm. Finally, facial traits, pixelation and a basic shadow effect can be optionally added. The results show that the method exhibits an excellent temporal consistency and can correctly process videos with different poses and appearances, dynamic shots, partial shots and diverse backgrounds. The proposed method provides a more simple and deterministic approach than diffusion models based video-to-video translation pipelines, which suffer from temporal consistency problems and do not cope well with pixelated and schematic outputs. The method is also much most practical than techniques based on 3D human pose estimation, which require custom handcrafted 3D models and are very limited with respect to the type of scenes they can process.
- [1569] arXiv:2402.09894 (cross-list from cs.HC) [ pdf , ps , other ]
-
Title: Not Just Novelty: A Longitudinal Study on Utility and Customization of AI WorkflowsComments: 21 pages, 11 figuresSubjects: Human-Computer Interaction (cs.HC) ; Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Computers and Society (cs.CY)
Abstract: Generative AI brings novel and impressive abilities to help people in everyday tasks. There are many AI workflows that solve real and complex problems by chaining AI outputs together with human interaction. Although there is an undeniable lure of AI, it's uncertain how useful generative AI workflows are after the novelty wears off. Additionally, tools built with generative AI have the potential to be personalized and adapted quickly and easily, but do users take advantage of the potential to customize? We conducted a three-week longitudinal study with 12 users to understand the familiarization and customization of generative AI tools for science communication. Our study revealed that the familiarization phase lasts for 4.3 sessions, where users explore the capabilities of the workflow and which aspects they find useful. After familiarization, the perceived utility of the system is rated higher than before, indicating that the perceived utility of AI is not just a novelty effect. The increase in benefits mainly comes from end-users' ability to customize prompts, and thus appropriate the system to their own needs. This points to a future where generative AI systems can allow us to design for appropriation.
- [1570] arXiv:2402.09900 (cross-list from cs.LG) [ pdf , ps , other ]
-
Title: Revisiting Recurrent Reinforcement Learning with Memory MonoidsSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Abstract: Memory models such as Recurrent Neural Networks (RNNs) and Transformers address Partially Observable Markov Decision Processes (POMDPs) by mapping trajectories to latent Markov states. Neither model scales particularly well to long sequences, especially compared to an emerging class of memory models sometimes called linear recurrent models. We discover that we can model the recurrent update of these models using a monoid, leading us to reformulate existing models using a novel memory monoid framework. We revisit the traditional approach to batching in recurrent RL, highlighting both theoretical and empirical deficiencies. We leverage the properties of memory monoids to propose a batching method that improves sample efficiency, increases the return, and simplifies the implementation of recurrent loss functions in RL.
- [1571] arXiv:2402.09906 (cross-list from cs.CL) [ pdf , ps , html , other ]
-
Title: Generative Representational Instruction TuningNiklas Muennighoff , Hongjin Su , Liang Wang , Nan Yang , Furu Wei , Tao Yu , Amanpreet Singh , Douwe KielaComments: 66 pages (16 main), 25 figures, 34 tablesSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: All text-based language problems can be reduced to either generation or embedding. Current models only perform well at one or the other. We introduce generative representational instruction tuning (GRIT) whereby a large language model is trained to handle both generative and embedding tasks by distinguishing between them through instructions. Compared to other open models, our resulting GritLM 7B sets a new state of the art on the Massive Text Embedding Benchmark (MTEB) and outperforms all models up to its size on a range of generative tasks. By scaling up further, GritLM 8x7B outperforms all open generative language models that we tried while still being among the best embedding models. Notably, we find that GRIT matches training on only generative or embedding data, thus we can unify both at no performance loss. Among other benefits, the unification via GRIT speeds up Retrieval-Augmented Generation (RAG) by > 60% for long documents, by no longer requiring separate retrieval and generation models. Models, code, etc. are freely available at this https URL .
- [1572] arXiv:2402.09911 (cross-list from cs.CL) [ pdf , ps , html , other ]
-
Title: Enhancing Large Language Models with Pseudo- and Multisource- Knowledge Graphs for Open-ended Question AnsweringSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI)
Abstract: Mitigating the hallucinations of Large Language Models (LLMs) and enhancing them is a crucial task. Although some existing methods employ model self-enhancement techniques, they fall short of effectively addressing unknown factual hallucinations. Using Knowledge Graph (KG) enhancement approaches fails to address the generalization across different KG sources and the enhancement of open-ended answer questions simultaneously. To tackle these limitations, there is a framework that combines Pseudo-Graph Generation and Atomic Knowledge Verification proposed. The enhancement of LLM using KG in an open-ended question-answering setting is implemented by leveraging the Pseudo-Graph Generation. Atomic Knowledge Verification utilizes atomic-level knowledge querying and verification to achieve generalizability under different KG sources. Compared to the baseline, this approach yields a minimum improvement of 11.5 in the ROUGE-L score for open-ended questions. For precise questions, we observe a minimum accuracy improvement of 7.5. Moreover, there is also demonstration that this framework exhibits generalizability across different KG sources. In summary, our results pave the way for enhancing LLMs by incorporating Pseudo- and Multisource-KGs, particularly in the context of open-ended questions.
- [1573] arXiv:2402.09921 (cross-list from cs.CY) [ pdf , ps , other ]
-
Title: Identifying and modelling cognitive biases in mobility choicesComments: M1 internship report from Univ. Lyon 1 Claude Bernard. Internship was from October 2022 to June 2023Subjects: Computers and Society (cs.CY) ; Artificial Intelligence (cs.AI); Multiagent Systems (cs.MA)
Abstract: This report presents results from an M1 internship dedicated to agent-based modelling and simulation of daily mobility choices. This simulation is intended to be realistic enough to serve as a basis for a serious game about the mobility transition. In order to ensure this level of realism, we conducted a survey to measure if real mobility choices are made rationally, or how biased they are. Results analysed here show that various biases could play a role in decisions. We then propose an implementation in a GAMA agent-based simulation.
- [1574] arXiv:2402.09923 (cross-list from cs.CL) [ pdf , ps , other ]
-
Title: A Dataset of Open-Domain Question Answering with Multiple-Span AnswersSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI)
Abstract: Multi-span answer extraction, also known as the task of multi-span question answering (MSQA), is critical for real-world applications, as it requires extracting multiple pieces of information from a text to answer complex questions. Despite the active studies and rapid progress in English MSQA research, there is a notable lack of publicly available MSQA benchmark in Chinese. Previous efforts for constructing MSQA datasets predominantly emphasized entity-centric contextualization, resulting in a bias towards collecting factoid questions and potentially overlooking questions requiring more detailed descriptive responses. To overcome these limitations, we present CLEAN, a comprehensive Chinese multi-span question answering dataset that involves a wide range of open-domain subjects with a substantial number of instances requiring descriptive answers. Additionally, we provide established models from relevant literature as baselines for CLEAN. Experimental results and analysis show the characteristics and challenge of the newly proposed CLEAN dataset for the community. Our dataset, CLEAN, will be publicly released at zhiyiluo.site/misc/clean_v1.0_ sample.json.
- [1575] arXiv:2402.09934 (cross-list from cs.CL) [ pdf , ps , html , other ]
-
Title: Paying Attention to Deflections: Mining Pragmatic Nuances for Whataboutism Detection in Online DiscourseComments: 14 pages, 5 figuresSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI)
Abstract: Whataboutism, a potent tool for disrupting narratives and sowing distrust, remains under-explored in quantitative NLP research. Moreover, past work has not distinguished its use as a strategy for misinformation and propaganda from its use as a tool for pragmatic and semantic framing. We introduce new datasets from Twitter and YouTube, revealing overlaps as well as distinctions between whataboutism, propaganda, and the tu quoque fallacy. Furthermore, drawing on recent work in linguistic semantics, we differentiate the `what about' lexical construct from whataboutism. Our experiments bring to light unique challenges in its accurate detection, prompting the introduction of a novel method using attention weights for negative sample mining. We report significant improvements of 4% and 10% over previous state-of-the-art methods in our Twitter and YouTube collections, respectively.
- [1576] arXiv:2402.09941 (cross-list from cs.LG) [ pdf , ps , html , other ]
-
Title: FedLion: Faster Adaptive Federated Optimization with Fewer CommunicationComments: ICASSP 2024Subjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
Abstract: In Federated Learning (FL), a framework to train machine learning models across distributed data, well-known algorithms like FedAvg tend to have slow convergence rates, resulting in high communication costs during training. To address this challenge, we introduce FedLion, an adaptive federated optimization algorithm that seamlessly incorporates key elements from the recently proposed centralized adaptive algorithm, Lion (Chen et al. 2o23), into the FL framework. Through comprehensive evaluations on two widely adopted FL benchmarks, we demonstrate that FedLion outperforms previous state-of-the-art adaptive algorithms, including FAFED (Wu et al. 2023) and FedDA. Moreover, thanks to the use of signed gradients in local training, FedLion substantially reduces data transmission requirements during uplink communication when compared to existing adaptive algorithms, further reducing communication costs. Last but not least, this work also includes a novel theoretical analysis, showcasing that FedLion attains faster convergence rate than established FL algorithms like FedAvg.
- [1577] arXiv:2402.09965 (cross-list from cs.LG) [ pdf , ps , other ]
-
Title: Hierarchy Representation of Data in Machine LearningsSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Abstract: When there are models with clear-cut judgment results for several data points, it is possible that most models exhibit a relationship where if they correctly judge one target, they also correctly judge another target. Conversely, if most models incorrectly judge one target, they may also incorrectly judge another target. We propose a method for visualizing this hierarchy among targets. This information is expected to be beneficial for model improvement.
- [1578] arXiv:2402.09977 (cross-list from cs.CL) [ pdf , ps , html , other ]
-
Title: Fast Vocabulary Transfer for Language Model CompressionComments: The 2022 Conference on Empirical Methods in Natural Language Processing (EMNLP 2022)Journal-ref: Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing (EMNLP 2022): Industry TrackSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: Real-world business applications require a trade-off between language model performance and size. We propose a new method for model compression that relies on vocabulary transfer. We evaluate the method on various vertical domains and downstream tasks. Our results indicate that vocabulary transfer can be effectively used in combination with other compression techniques, yielding a significant reduction in model size and inference time while marginally compromising on performance.
- [1579] arXiv:2402.09982 (cross-list from cs.CV) [ pdf , ps , other ]
-
Title: Data Augmentation and Transfer Learning Approaches Applied to Facial Expressions RecognitionComments: The 11th International Conference on Artificial Intelligence, Soft Computing and Applications (AIAA 2021)Journal-ref: Proceeding of the 11th International Conference on Artificial Intelligence, Soft Computing and Applications (AIAA 2021)Subjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: The face expression is the first thing we pay attention to when we want to understand a person's state of mind. Thus, the ability to recognize facial expressions in an automatic way is a very interesting research field. In this paper, because the small size of available training datasets, we propose a novel data augmentation technique that improves the performances in the recognition task. We apply geometrical transformations and build from scratch GAN models able to generate new synthetic images for each emotion type. Thus, on the augmented datasets we fine tune pretrained convolutional neural networks with different architectures. To measure the generalization ability of the models, we apply extra-database protocol approach, namely we train models on the augmented versions of training dataset and test them on two different databases. The combination of these techniques allows to reach average accuracy values of the order of 85\% for the InceptionResNetV2 model.
- [1580] arXiv:2402.09984 (cross-list from cs.LG) [ pdf , ps , html , other ]
-
Title: Symmetry-Breaking Augmentations for Ad Hoc TeamworkComments: Currently in review for ICML 2024. 16 pages (including references and appendix), 9 Figures, 11 tablesSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Abstract: In many collaborative settings, artificial intelligence (AI) agents must be able to adapt to new teammates that use unknown or previously unobserved strategies. While often simple for humans, this can be challenging for AI agents. For example, if an AI agent learns to drive alongside others (a training set) that only drive on one side of the road, it may struggle to adapt this experience to coordinate with drivers on the opposite side, even if their behaviours are simply flipped along the left-right symmetry. To address this we introduce symmetry-breaking augmentations (SBA), which increases diversity in the behaviour of training teammates by applying a symmetry-flipping operation. By learning a best-response to the augmented set of teammates, our agent is exposed to a wider range of behavioural conventions, improving performance when deployed with novel teammates. We demonstrate this experimentally in two settings, and show that our approach improves upon previous ad hoc teamwork results in the challenging card game Hanabi. We also propose a general metric for estimating symmetry-dependency amongst a given set of policies.
- [1581] arXiv:2402.10002 (cross-list from cs.CV) [ pdf , ps , html , other ]
-
Title: MM-Point: Multi-View Information-Enhanced Multi-Modal Self-Supervised 3D Point Cloud UnderstandingComments: Accepted by AAAI 2024Journal-ref: AAAI 2024Subjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI); Multimedia (cs.MM)
Abstract: In perception, multiple sensory information is integrated to map visual information from 2D views onto 3D objects, which is beneficial for understanding in 3D environments. But in terms of a single 2D view rendered from different angles, only limited partial information can be provided.The richness and value of Multi-view 2D information can provide superior self-supervised signals for 3D objects. In this paper, we propose a novel self-supervised point cloud representation learning method, MM-Point, which is driven by intra-modal and inter-modal similarity objectives. The core of MM-Point lies in the Multi-modal interaction and transmission between 3D objects and multiple 2D views at the same time. In order to more effectively simultaneously perform the consistent cross-modal objective of 2D multi-view information based on contrastive learning, we further propose Multi-MLP and Multi-level Augmentation strategies. Through carefully designed transformation strategies, we further learn Multi-level invariance in 2D Multi-views. MM-Point demonstrates state-of-the-art (SOTA) performance in various downstream tasks. For instance, it achieves a peak accuracy of 92.4% on the synthetic dataset ModelNet40, and a top accuracy of 87.8% on the real-world dataset ScanObjectNN, comparable to fully supervised methods. Additionally, we demonstrate its effectiveness in tasks such as few-shot classification, 3D part segmentation and 3D semantic segmentation.
- [1582] arXiv:2402.10005 (cross-list from cs.SD) [ pdf , ps , html , other ]
-
Title: ML-ASPA: A Contemplation of Machine Learning-based Acoustic Signal Processing Analysis for Sounds, & Strains Emerging TechnologyRatul Ali , Aktarul Islam , Md. Shohel Rana , Saila Nasrin , Sohel Afzal Shajol , Professor Dr. A.H.M. Saifullah SadiComments: 7 pages, 5 figures, ArticleSubjects: Sound (cs.SD) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
Abstract: Acoustic data serves as a fundamental cornerstone in advancing scientific and engineering understanding across diverse disciplines, spanning biology, communications, and ocean and Earth science. This inquiry meticulously explores recent advancements and transformative potential within the domain of acoustics, specifically focusing on machine learning (ML) and deep learning. ML, comprising an extensive array of statistical techniques, proves indispensable for autonomously discerning and leveraging patterns within data. In contrast to traditional acoustics and signal processing, ML adopts a data-driven approach, unveiling intricate relationships between features and desired labels or actions, as well as among features themselves, given ample training data. The application of ML to expansive sets of training data facilitates the discovery of models elucidating complex acoustic phenomena such as human speech and reverberation. The dynamic evolution of ML in acoustics yields compelling results and holds substantial promise for the future. The advent of electronic stethoscopes and analogous recording and data logging devices has expanded the application of acoustic signal processing concepts to the analysis of bowel sounds. This paper critically reviews existing literature on acoustic signal processing for bowel sound analysis, outlining fundamental approaches and applicable machine learning principles. It chronicles historical progress in signal processing techniques that have facilitated the extraction of valuable information from bowel sounds, emphasizing advancements in noise reduction, segmentation, signal enhancement, feature extraction, sound localization, and machine learning techniques...
- [1583] arXiv:2402.10024 (cross-list from cs.CL) [ pdf , ps , other ]
-
Title: Self-Augmented In-Context Learning for Unsupervised Word TranslationComments: 10 Pages, 3 Figures, 7 TablesSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI); Information Retrieval (cs.IR); Machine Learning (cs.LG)
Abstract: Recent work has shown that, while large language models (LLMs) demonstrate strong word translation or bilingual lexicon induction (BLI) capabilities in few-shot setups, they still cannot match the performance of 'traditional' mapping-based approaches in the unsupervised scenario where no seed translation pairs are available, especially for lower-resource languages. To address this challenge with LLMs, we propose self-augmented in-context learning (SAIL) for unsupervised BLI: starting from a zero-shot prompt, SAIL iteratively induces a set of high-confidence word translation pairs for in-context learning (ICL) from an LLM, which it then reapplies to the same LLM in the ICL fashion. Our method shows substantial gains over zero-shot prompting of LLMs on two established BLI benchmarks spanning a wide range of language pairs, also outperforming mapping-based baselines across the board. In addition to achieving state-of-the-art unsupervised BLI performance, we also conduct comprehensive analyses on SAIL and discuss its limitations.
- [1584] arXiv:2402.10028 (cross-list from cs.LG) [ pdf , ps , other ]
-
Title: Diffusion Models Meet Contextual Bandits with Large Action SpacesComments: 26 pages, 5 figuresSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
Abstract: Efficient exploration is a key challenge in contextual bandits due to the large size of their action space, where uninformed exploration can result in computational and statistical inefficiencies. Fortunately, the rewards of actions are often correlated and this can be leveraged to explore them efficiently. In this work, we capture such correlations using pre-trained diffusion models; upon which we design diffusion Thompson sampling (dTS). Both theoretical and algorithmic foundations are developed for dTS, and empirical evaluation also shows its favorable performance.
- [1585] arXiv:2402.10038 (cross-list from cs.CL) [ pdf , ps , html , other ]
-
Title: RS-DPO: A Hybrid Rejection Sampling and Direct Preference Optimization Method for Alignment of Large Language ModelsComments: 16 pages, 4 figuresSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Abstract: Reinforcement learning from human feedback (RLHF) has been extensively employed to align large language models with user intent. However, proximal policy optimization (PPO) based RLHF is occasionally unstable requiring significant hyperparameter finetuning, and computationally expensive to maximize the estimated reward during alignment. Recently, direct preference optimization (DPO) is proposed to address those challenges. However, DPO relies on contrastive responses generated from human annotator and alternative LLM, instead of the policy model, limiting the effectiveness of the RLHF. In this paper, we addresses both challenges by systematically combining rejection sampling (RS) and DPO. Our proposed method, RS-DPO, initiates with the development of a supervised fine-tuned policy model (SFT). A varied set of k responses per prompt are sampled directly from the SFT model. RS-DPO identifies pairs of contrastive samples based on their reward distribution. Finally, we apply DPO with the contrastive samples to align the model to human preference. Our experiments indicate that our proposed method effectively fine-tunes LLMs with limited resource environments, leading to improved alignment with user intent. Furthermore, it outperforms existing methods, including RS, PPO, and DPO.
- [1586] arXiv:2402.10050 (cross-list from cs.HC) [ pdf , ps , other ]
-
Title: On-Demand Myoelectric Control Using Wake Gestures to Eliminate False Activations During Activities of Daily LivingSubjects: Human-Computer Interaction (cs.HC) ; Artificial Intelligence (cs.AI)
Abstract: While myoelectric control has recently become a focus of increased research as a possible flexible hands-free input modality, current control approaches are prone to inadvertent false activations in real-world conditions. In this work, a novel myoelectric control paradigm -- on-demand myoelectric control -- is proposed, designed, and evaluated, to reduce the number of unrelated muscle movements that are incorrectly interpreted as input gestures . By leveraging the concept of wake gestures, users were able to switch between a dedicated control mode and a sleep mode, effectively eliminating inadvertent activations during activities of daily living (ADLs). The feasibility of wake gestures was demonstrated in this work through two online ubiquitous EMG control tasks with varying difficulty levels; dismissing an alarm and controlling a robot. The proposed control scheme was able to appropriately ignore almost all non-targeted muscular inputs during ADLs (>99.9%) while maintaining sufficient sensitivity for reliable mode switching during intentional wake gesture elicitation. These results highlight the potential of wake gestures as a critical step towards enabling ubiquitous myoelectric control-based on-demand input for a wide range of applications.
- [1587] arXiv:2402.10052 (cross-list from cs.CL) [ pdf , ps , html , other ]
-
Title: Unmemorization in Large Language Models via Self-Distillation and Deliberate ImaginationSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI)
Abstract: While displaying impressive generation capabilities across many tasks, Large Language Models (LLMs) still struggle with crucial issues of privacy violation and unwanted exposure of sensitive data. This raises an essential question: how should we prevent such undesired behavior of LLMs while maintaining their strong generation and natural language understanding (NLU) capabilities? In this work, we introduce a novel approach termed deliberate imagination in the context of LLM unlearning. Instead of trying to forget memorized data, we employ a self-distillation framework, guiding LLMs to deliberately imagine alternative scenarios. As demonstrated in a wide range of experiments, the proposed method not only effectively unlearns targeted text but also preserves the LLMs' capabilities in open-ended generation tasks as well as in NLU tasks. Our results demonstrate the usefulness of this approach across different models and sizes, and also with parameter-efficient fine-tuning, offering a novel pathway to addressing the challenges with private and sensitive data in LLM applications.
- [1588] arXiv:2402.10055 (cross-list from eess.IV) [ pdf , ps , html , other ]
-
Title: Robust semi-automatic vessel tracing in the human retinal image by an instance segmentation neural networkSubjects: Image and Video Processing (eess.IV) ; Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
Abstract: The morphology and hierarchy of the vascular systems are essential for perfusion in supporting metabolism. In human retina, one of the most energy-demanding organs, retinal circulation nourishes the entire inner retina by an intricate vasculature emerging and remerging at the optic nerve head (ONH). Thus, tracing the vascular branching from ONH through the vascular tree can illustrate vascular hierarchy and allow detailed morphological quantification, and yet remains a challenging task. Here, we presented a novel approach for a robust semi-automatic vessel tracing algorithm on human fundus images by an instance segmentation neural network (InSegNN). Distinct from semantic segmentation, InSegNN separates and labels different vascular trees individually and therefore enable tracing each tree throughout its branching. We have built-in three strategies to improve robustness and accuracy with temporal learning, spatial multi-sampling, and dynamic probability map. We achieved 83% specificity, and 50% improvement in Symmetric Best Dice (SBD) compared to literature, and outperformed baseline U-net. We have demonstrated tracing individual vessel trees from fundus images, and simultaneously retain the vessel hierarchy information. InSegNN paves a way for any subsequent morphological analysis of vascular morphology in relation to retinal diseases.
- [1589] arXiv:2402.10067 (cross-list from cs.DC) [ pdf , ps , html , other ]
-
Title: LLM-based policy generation for intent-based management of applicationsComments: This article has been accepted for publication in 2023 19th International Conference on Network and Service Management (CNSM), 3rd International Workshop on Analytics for Service and Application Management (AnServApp 2023)Journal-ref: 2023 19th International Conference on Network and Service Management (CNSM), 2023, pp. 1-7Subjects: Distributed, Parallel, and Cluster Computing (cs.DC) ; Artificial Intelligence (cs.AI); Formal Languages and Automata Theory (cs.FL); Human-Computer Interaction (cs.HC); Machine Learning (cs.LG)
Abstract: Automated management requires decomposing high-level user requests, such as intents, to an abstraction that the system can understand and execute. This is challenging because even a simple intent requires performing a number of ordered steps. And the task of identifying and adapting these steps (as conditions change) requires a decomposition approach that cannot be exactly pre-defined beforehand. To tackle these challenges and support automated intent decomposition and execution, we explore the few-shot capability of Large Language Models (LLMs). We propose a pipeline that progressively decomposes intents by generating the required actions using a policy-based abstraction. This allows us to automate the policy execution by creating a closed control loop for the intent deployment. To do so, we generate and map the policies to APIs and form application management loops that perform the necessary monitoring, analysis, planning and execution. We evaluate our proposal with a use-case to fulfill and assure an application service chain of virtual network functions. Using our approach, we can generalize and generate the necessary steps to realize intents, thereby enabling intent automation for application management.
- [1590] arXiv:2402.10076 (cross-list from cs.LG) [ pdf , ps , other ]
-
Title: QUICK: Quantization-aware Interleaving and Conflict-free Kernel for efficient LLM inferenceComments: 9 pages, 8 figuresSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
Abstract: We introduce QUICK, a group of novel optimized CUDA kernels for the efficient inference of quantized Large Language Models (LLMs). QUICK addresses the shared memory bank-conflict problem of state-of-the-art mixed precision matrix multiplication kernels. Our method interleaves the quantized weight matrices of LLMs offline to skip the shared memory write-back after the dequantization. We demonstrate up to 1.91x speedup over existing kernels of AutoAWQ on larger batches and up to 1.94x throughput gain on representative LLM models on various NVIDIA GPU devices.
- [1591] arXiv:2402.10086 (cross-list from cs.RO) [ pdf , ps , other ]
-
Title: Explainable AI for Safe and Trustworthy Autonomous Driving: A Systematic ReviewSubjects: Robotics (cs.RO) ; Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Human-Computer Interaction (cs.HC); Machine Learning (cs.LG)
Abstract: Artificial Intelligence (AI) shows promising applications for the perception and planning tasks in autonomous driving (AD) due to its superior performance compared to conventional methods. However, inscrutable AI systems exacerbate the existing challenge of safety assurance of AD. One way to mitigate this challenge is to utilize explainable AI (XAI) techniques. To this end, we present the first comprehensive systematic literature review of explainable methods for safe and trustworthy AD. We begin by analyzing the requirements for AI in the context of AD, focusing on three key aspects: data, model, and agency. We find that XAI is fundamental to meeting these requirements. Based on this, we explain the sources of explanations in AI and describe a taxonomy of XAI. We then identify five key contributions of XAI for safe and trustworthy AI in AD, which are interpretable design, interpretable surrogate models, interpretable monitoring, auxiliary explanations, and interpretable validation. Finally, we propose a modular framework called SafeX to integrate these contributions, enabling explanation delivery to users while simultaneously ensuring the safety of AI models.
- [1592] arXiv:2402.10091 (cross-list from cs.DB) [ pdf , ps , html , other ]
-
Title: Text-Based Product Matching -- Semi-Supervised Clustering ApproachSubjects: Databases (cs.DB) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: Matching identical products present in multiple product feeds constitutes a crucial element of many tasks of e-commerce, such as comparing product offerings, dynamic price optimization, and selecting the assortment personalized for the client. It corresponds to the well-known machine learning task of entity matching, with its own specificity, like omnipresent unstructured data or inaccurate and inconsistent product descriptions. This paper aims to present a new philosophy to product matching utilizing a semi-supervised clustering approach. We study the properties of this method by experimenting with the IDEC algorithm on the real-world dataset using predominantly textual features and fuzzy string matching, with more standard approaches as a point of reference. Encouraging results show that unsupervised matching, enriched with a small annotated sample of product links, could be a possible alternative to the dominant supervised strategy, requiring extensive manual data labeling.
- [1593] arXiv:2402.10093 (cross-list from cs.CV) [ pdf , ps , other ]
-
Title: MIM-Refiner: A Contrastive Learning Boost from Intermediate Pre-Trained RepresentationsSubjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: We introduce MIM (Masked Image Modeling)-Refiner, a contrastive learning boost for pre-trained MIM models. The motivation behind MIM-Refiner is rooted in the insight that optimal representations within MIM models generally reside in intermediate layers. Accordingly, MIM-Refiner leverages multiple contrastive heads that are connected to diverse intermediate layers. In each head, a modified nearest neighbor objective helps to construct respective semantic clusters.
The refinement process is short but effective. Within a few epochs, we refine the features of MIM models from subpar to state-of-the-art, off-the-shelf features. Refining a ViT-H, pre-trained with data2vec 2.0 on ImageNet-1K, achieves new state-of-the-art results in linear probing (84.7%) and low-shot classification among models that are pre-trained on ImageNet-1K. In ImageNet-1K 1-shot classification, MIM-Refiner sets a new state-of-the-art of 64.2%, outperforming larger models that were trained on up to 2000x more data such as DINOv2-g, OpenCLIP-G and MAWS-6.5B. Project page: this https URL - [1594] arXiv:2402.10100 (cross-list from cs.SD) [ pdf , ps , html , other ]
-
Title: Tuning In: Analysis of Audio Classifier Performance in Clinical Settings with Limited DataHamza Mahdi , Eptehal Nashnoush , Rami Saab , Arjun Balachandar , Rishit Dagli , Lucas X. Perri , Houman KhosravaniComments: CHIL 2024Subjects: Sound (cs.SD) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
Abstract: This study assesses deep learning models for audio classification in a clinical setting with the constraint of small datasets reflecting real-world prospective data collection. We analyze CNNs, including DenseNet and ConvNeXt, alongside transformer models like ViT, SWIN, and AST, and compare them against pre-trained audio models such as YAMNet and VGGish. Our method highlights the benefits of pre-training on large datasets before fine-tuning on specific clinical data. We prospectively collected two first-of-their-kind patient audio datasets from stroke patients. We investigated various preprocessing techniques, finding that RGB and grayscale spectrogram transformations affect model performance differently based on the priors they learn from pre-training. Our findings indicate CNNs can match or exceed transformer models in small dataset contexts, with DenseNet-Contrastive and AST models showing notable performance. This study highlights the significance of incremental marginal gains through model selection, pre-training, and preprocessing in sound classification; this offers valuable insights for clinical diagnostics that rely on audio classification.
- [1595] arXiv:2402.10101 (cross-list from cs.LG) [ pdf , ps , other ]
-
Title: Deep Learning Based Situation Awareness for Multiple Missiles EvasionSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Abstract: As the effective range of air-to-air missiles increases, it becomes harder for human operators to maintain the situational awareness needed to keep a UAV safe. In this work, we propose a decision support tool to help UAV operators in Beyond Visual Range (BVR) air combat scenarios assess the risks of different options and make decisions based on those. Earlier work focused on the threat posed by a single missile, and in this work, we extend the ideas to several missile threats. The proposed method uses Deep Neural Networks (DNN) to learn from high-fidelity simulations to provide the operator with an outcome estimate for a set of different strategies. Our results demonstrate that the proposed system can manage multiple incoming missiles, evaluate a family of options, and recommend the least risky course of action.
- [1596] arXiv:2402.10107 (cross-list from cs.CL) [ pdf , ps , other ]
-
Title: Quantized Embedding Vectors for Controllable Diffusion Language ModelsSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI)
Abstract: Improving the controllability, portability, and inference speed of diffusion language models (DLMs) is a key challenge in natural language generation. While recent research has shown significant success in complex text generation with language models, the memory and computational power are still very demanding and fall short of expectations, which naturally results in low portability and instability for the models. To mitigate these issues, numerous well-established methods were proposed for neural network quantization. To further enhance their portability of independent deployment as well as improve their stability evaluated by language perplexity, we propose a novel approach called the Quantized Embedding Controllable Diffusion Language Model (QE-CDLM). QE-CDLM builds upon the recent successful controllable DLMs by remodeling the task-specific embedding space via quantization. This leads to a gradient-based controller for the generation tasks, and more stable intermediate latent variables are obtained, which naturally brings in an accelerated convergence as well as better controllability. Additionally, the adaption fine-tuning method is employed to reduce tunable weights. Experimental results on five challenging fine-grained control tasks demonstrate that QE-CDLM compares favorably to existing methods in terms of quality and feasibility, achieving better perplexity and lightweight fine-tuning.
- [1597] arXiv:2402.10110 (cross-list from cs.CL) [ pdf , ps , html , other ]
-
Title: Selective Reflection-Tuning: Student-Selected Data Recycling for LLM Instruction-TuningSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: Instruction tuning is critical to large language models (LLMs) for achieving better instruction following and task adaptation capabilities but its success heavily relies on the training data quality. Many recent methods focus on improving the data quality but often overlook the compatibility of the data with the student model being finetuned. This paper introduces Selective Reflection-Tuning, a novel paradigm that synergizes a teacher LLM's reflection and introspection for improving existing data quality with the data selection capability of the student LLM, to automatically refine existing instruction-tuning data. This teacher-student collaboration produces high-quality and student-compatible instruction-response pairs, resulting in sample-efficient instruction tuning and LLMs of superior performance. Selective Reflection-Tuning is a data augmentation and synthesis that generally improves LLM finetuning and self-improvement without collecting brand-new data. We apply our method to Alpaca and WizardLM data and achieve much stronger and top-tier 7B and 13B LLMs. Our codes, models, and data will be released at this https URL .
- [1598] arXiv:2402.10130 (cross-list from cs.LG) [ pdf , ps , other ]
-
Title: Is Continual Learning Ready for Real-world Challenges?Subjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
Abstract: Despite continual learning's long and well-established academic history, its application in real-world scenarios remains rather limited. This paper contends that this gap is attributable to a misalignment between the actual challenges of continual learning and the evaluation protocols in use, rendering proposed solutions ineffective for addressing the complexities of real-world setups. We validate our hypothesis and assess progress to date, using a new 3D semantic segmentation benchmark, OCL-3DSS. We investigate various continual learning schemes from the literature by utilizing more realistic protocols that necessitate online and continual learning for dynamic, real-world scenarios (eg., in robotics and 3D vision applications). The outcomes are sobering: all considered methods perform poorly, significantly deviating from the upper bound of joint offline training. This raises questions about the applicability of existing methods in realistic settings. Our paper aims to initiate a paradigm shift, advocating for the adoption of continual learning methods through new experimental protocols that better emulate real-world conditions to facilitate breakthroughs in the field.
- [1599] arXiv:2402.10135 (cross-list from cs.LG) [ pdf , ps , other ]
-
Title: Benchmarking federated strategies in Peer-to-Peer Federated learning for biomedical dataJournal-ref: Heliyon 9 (2023) e16925Subjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Distributed, Parallel, and Cluster Computing (cs.DC)
Abstract: The increasing requirements for data protection and privacy has attracted a huge research interest on distributed artificial intelligence and specifically on federated learning, an emerging machine learning approach that allows the construction of a model between several participants who hold their own private data. In the initial proposal of federated learning the architecture was centralised and the aggregation was done with federated averaging, meaning that a central server will orchestrate the federation using the most straightforward averaging strategy. This research is focused on testing different federated strategies in a peer-to-peer environment. The authors propose various aggregation strategies for federated learning, including weighted averaging aggregation, using different factors and strategies based on participant contribution. The strategies are tested with varying data sizes to identify the most robust ones. This research tests the strategies with several biomedical datasets and the results of the experiments show that the accuracy-based weighted average outperforms the classical federated averaging method.
- [1600] arXiv:2402.10142 (cross-list from cs.LG) [ pdf , ps , other ]
-
Title: Tracking Changing Probabilities via Dynamic LearnersComments: 63 pages, 24 figures, 17 tablesSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Abstract: Consider a predictor, a learner, whose input is a stream of discrete items. The predictor's task, at every time point, is probabilistic multiclass prediction, i.e., to predict which item may occur next by outputting zero or more candidate items, each with a probability, after which the actual item is revealed and the predictor learns from this observation. To output probabilities, the predictor keeps track of the proportions of the items it has seen. The stream is unbounded and the predictor has finite limited space and we seek efficient prediction and update techniques: the set of items is unknown to the predictor and their totality can also grow unbounded. Moreover, there is non-stationarity: the underlying frequencies of items may change, substantially, from time to time. For instance, new items may start appearing and a few recently frequent items may cease to occur again. The predictor, being space-bounded, need only provide probabilities for those items with (currently) sufficiently high frequency, i.e., the salient items. This problem is motivated in the setting of prediction games, a self-supervised learning regime where concepts serve as both the predictors and the predictands, and the set of concepts grows over time, resulting in non-stationarities as new concepts are generated and used. We develop sparse multiclass moving average techniques designed to respond to such non-stationarities in a timely manner. One technique is based on the exponentiated moving average (EMA) and another is based on queuing a few count snapshots. We show that the combination, and in particular supporting dynamic predictand-specific learning rates, offers advantages in terms of faster change detection and convergence.
- [1601] arXiv:2402.10168 (cross-list from cs.SD) [ pdf , ps , other ]
-
Title: DeepSRGM -- Sequence Classification and Ranking in Indian Classical Music with Deep LearningSubjects: Sound (cs.SD) ; Artificial Intelligence (cs.AI); Information Retrieval (cs.IR); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
Abstract: A vital aspect of Indian Classical Music (ICM) is Raga, which serves as a melodic framework for compositions and improvisations alike. Raga Recognition is an important music information retrieval task in ICM as it can aid numerous downstream applications ranging from music recommendations to organizing huge music collections. In this work, we propose a deep learning based approach to Raga recognition. Our approach employs efficient pre possessing and learns temporal sequences in music data using Long Short Term Memory based Recurrent Neural Networks (LSTM-RNN). We train and test the network on smaller sequences sampled from the original audio while the final inference is performed on the audio as a whole. Our method achieves an accuracy of 88.1% and 97 % during inference on the Comp Music Carnatic dataset and its 10 Raga subset respectively making it the state-of-the-art for the Raga recognition task. Our approach also enables sequence ranking which aids us in retrieving melodic patterns from a given music data base that are closely related to the presented query sequence.
- [1602] arXiv:2402.10171 (cross-list from cs.CL) [ pdf , ps , other ]
-
Title: Data Engineering for Scaling Language Models to 128K ContextComments: Code at this https URLSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI)
Abstract: We study the continual pretraining recipe for scaling language models' context lengths to 128K, with a focus on data engineering. We hypothesize that long context modeling, in particular \textit{the ability to utilize information at arbitrary input locations}, is a capability that is mostly already acquired through large-scale pretraining, and that this capability can be readily extended to contexts substantially longer than seen during training~(e.g., 4K to 128K) through lightweight continual pretraining on appropriate data mixture. We investigate the \textit{quantity} and \textit{quality} of the data for continual pretraining: (1) for quantity, we show that 500 million to 5 billion tokens are enough to enable the model to retrieve information anywhere within the 128K context; (2) for quality, our results equally emphasize \textit{domain balance} and \textit{length upsampling}. Concretely, we find that naively upsampling longer data on certain domains like books, a common practice of existing work, gives suboptimal performance, and that a balanced domain mixture is important. We demonstrate that continual pretraining of the full model on 1B-5B tokens of such data is an effective and affordable strategy for scaling the context length of language models to 128K. Our recipe outperforms strong open-source long-context models and closes the gap to frontier models like GPT-4 128K.
- [1603] arXiv:2402.10176 (cross-list from cs.CL) [ pdf , ps , html , other ]
-
Title: OpenMathInstruct-1: A 1.8 Million Math Instruction Tuning DatasetComments: Data and models are available at this https URLSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: Recent work has shown the immense potential of synthetically generated datasets for training large language models (LLMs), especially for acquiring targeted skills. Current large-scale math instruction tuning datasets such as MetaMathQA (Yu et al., 2024) and MAmmoTH (Yue et al., 2024) are constructed using outputs from closed-source LLMs with commercially restrictive licenses. A key reason limiting the use of open-source LLMs in these data generation pipelines has been the wide gap between the mathematical skills of the best closed-source LLMs, such as GPT-4, and the best open-source LLMs. Building on the recent progress in open-source LLMs, our proposed prompting novelty, and some brute-force scaling, we construct OpenMathInstruct-1, a math instruction tuning dataset with 1.8M problem-solution pairs. The dataset is constructed by synthesizing code-interpreter solutions for GSM8K and MATH, two popular math reasoning benchmarks, using the recently released and permissively licensed Mixtral model. Our best model, OpenMath-CodeLlama-70B, trained on a subset of OpenMathInstruct-1, achieves a score of 84.6% on GSM8K and 50.7% on MATH, which is competitive with the best gpt-distilled models. We release our code, models, and the OpenMathInstruct-1 dataset under a commercially permissive license.
- [1604] arXiv:2402.10177 (cross-list from cs.LG) [ pdf , ps , other ]
-
Title: Large Scale Constrained Clustering With Reinforcement LearningComments: LEANOPT-24 AAAISubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Abstract: Given a network, allocating resources at clusters level, rather than at each node, enhances efficiency in resource allocation and usage. In this paper, we study the problem of finding fully connected disjoint clusters to minimize the intra-cluster distances and maximize the number of nodes assigned to the clusters, while also ensuring that no two nodes within a cluster exceed a threshold distance. While the problem can easily be formulated using a binary linear model, traditional combinatorial optimization solvers struggle when dealing with large-scale instances. We propose an approach to solve this constrained clustering problem via reinforcement learning. Our method involves training an agent to generate both feasible and (near) optimal solutions. The agent learns problem-specific heuristics, tailored to the instances encountered in this task. In the results section, we show that our algorithm finds near optimal solutions, even for large scale instances.
- [1605] arXiv:2402.10184 (cross-list from cs.LG) [ pdf , ps , other ]
-
Title: Rethinking Information Structures in RLHF: Reward Generalization from a Graph Theory PerspectiveTianyi Qiu , Fanzhi Zeng , Jiaming Ji , Dong Yan , Kaile Wang , Jiayi Zhou , Yang Han , Josef Dai , Xuehai Pan , Yaodong YangSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Discrete Mathematics (cs.DM)
Abstract: There is a trilemma in reinforcement learning from human feedback (RLHF): the incompatibility between highly diverse contexts, low labeling cost, and reliable alignment performance. We mitigate such incompatibility through the design of dataset information structures during reward modeling, and introduce the Induced Bayesian Network (IBN), the first theory of reward generalization capable of generating substantial verified predictions on large language models (LLMs). Specifically, we first reexamine the RLHF process and propose a theoretical framework portraying it as an autoencoding process over text distributions. Our framework formalizes the RLHF objective of ensuring distributional consistency between human preference and LLM behavior. Then, based on this framework, we introduce the IBN to analyze generalization in the reward modeling stage of RLHF. Drawing from random graph theory and causal analysis, it enables empirically grounded derivation of generalization error bounds, a key improvement over classical theories of generalization. Finally, an insight from our analysis is the superiority of the tree-based information structure in reward modeling, compared to chain-based baselines in conventional RLHF methods. With IBN, we derive that in complex contexts with limited data, the tree-based reward model (RM), trained on a tree-structured preference dataset, induces up to $\Theta(\log n/\log\log n)$ times less variance than the baseline, where $n$ is the dataset size. As validation, we demonstrate that on three NLP tasks, the tree-based RM achieves 65% win rate on average against chain-based baselines. It shows that alignment performance can be gained for free via the design of dataset information structure, without the need for any other changes.
- [1606] arXiv:2402.10192 (cross-list from cs.LG) [ pdf , ps , html , other ]
-
Title: Multi-Excitation Projective Simulation with a Many-Body Physics Inspired Inductive BiasComments: 24 pages, 8 figures; Code repository at this https URL . Added figures and shortened computer maintenance section text for better readabilitySubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Discrete Mathematics (cs.DM); Quantum Physics (quant-ph)
Abstract: With the impressive progress of deep learning, applications relying on machine learning are increasingly being integrated into daily life. However, most deep learning models have an opaque, oracle-like nature making it difficult to interpret and understand their decisions. This problem led to the development of the field known as eXplainable Artificial Intelligence (XAI). One method in this field known as Projective Simulation (PS) models a chain-of-thought as a random walk of a particle on a graph with vertices that have concepts attached to them. While this description has various benefits, including the possibility of quantization, it cannot be naturally used to model thoughts that combine several concepts simultaneously. To overcome this limitation, we introduce Multi-Excitation Projective Simulation (mePS), a generalization that considers a chain-of-thought to be a random walk of several particles on a hypergraph. A definition for a dynamic hypergraph is put forward to describe the agent's training history along with applications to AI and hypergraph visualization. An inductive bias inspired by the remarkably successful few-body interaction models used in quantum many-body physics is formalized for our classical mePS framework and employed to tackle the exponential complexity associated with naive implementations of hypergraphs. We prove that our inductive bias reduces the complexity from exponential to polynomial, with the exponent representing the cutoff on how many particles can interact. We numerically apply our method to two toy environments and a more complex scenario modelling the diagnosis of a broken computer. These environments demonstrate the resource savings provided by an appropriate choice of inductive bias, as well as showcasing aspects of interpretability. A quantum model for mePS is also briefly outlined and some future directions for it are discussed.
- [1607] arXiv:2402.10196 (cross-list from cs.CL) [ pdf , ps , other ]
-
Title: A Trembling House of Cards? Mapping Adversarial Attacks against Language AgentsSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI)
Abstract: Language agents powered by large language models (LLMs) have seen exploding development. Their capability of using language as a vehicle for thought and communication lends an incredible level of flexibility and versatility. People have quickly capitalized on this capability to connect LLMs to a wide range of external components and environments: databases, tools, the Internet, robotic embodiment, etc. Many believe an unprecedentedly powerful automation technology is emerging. However, new automation technologies come with new safety risks, especially for intricate systems like language agents. There is a surprisingly large gap between the speed and scale of their development and deployment and our understanding of their safety risks. Are we building a house of cards? In this position paper, we present the first systematic effort in mapping adversarial attacks against language agents. We first present a unified conceptual framework for agents with three major components: Perception, Brain, and Action. Under this framework, we present a comprehensive discussion and propose 12 potential attack scenarios against different components of an agent, covering different attack strategies (e.g., input manipulation, adversarial demonstrations, jailbreaking, backdoors). We also draw connections to successful attack strategies previously applied to LLMs. We emphasize the urgency to gain a thorough understanding of language agent risks before their widespread deployment.
- [1608] arXiv:2402.10204 (cross-list from astro-ph.IM) [ pdf , ps , html , other ]
-
Title: Radio-astronomical Image Reconstruction with Conditional Denoising Diffusion ModelMariia Drozdova , Vitaliy Kinakh , Omkar Bait , Olga Taran , Erica Lastufka , Miroslava Dessauges-Zavadsky , Taras Holotyak , Daniel Schaerer , Slava VoloshynovskiyComments: In production in Astronomy&AstrophyicsSubjects: Instrumentation and Methods for Astrophysics (astro-ph.IM) ; Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
Abstract: Reconstructing sky models from dirty radio images for accurate source localization and flux estimation is crucial for studying galaxy evolution at high redshift, especially in deep fields using instruments like the Atacama Large Millimetre Array (ALMA). With new projects like the Square Kilometre Array (SKA), there's a growing need for better source extraction methods. Current techniques, such as CLEAN and PyBDSF, often fail to detect faint sources, highlighting the need for more accurate methods. This study proposes using stochastic neural networks to rebuild sky models directly from dirty images. This method can pinpoint radio sources and measure their fluxes with related uncertainties, marking a potential improvement in radio source characterization. We tested this approach on 10164 images simulated with the CASA tool simalma, based on ALMA's Cycle 5.3 antenna setup. We applied conditional Denoising Diffusion Probabilistic Models (DDPMs) for sky models reconstruction, then used Photutils to determine source coordinates and fluxes, assessing the model's performance across different water vapor levels. Our method showed excellence in source localization, achieving more than 90% completeness at a signal-to-noise ratio (SNR) as low as 2. It also surpassed PyBDSF in flux estimation, accurately identifying fluxes for 96% of sources in the test set, a significant improvement over CLEAN+ PyBDSF's 57%. Conditional DDPMs is a powerful tool for image-to-image translation, yielding accurate and robust characterisation of radio sources, and outperforming existing methodologies. While this study underscores its significant potential for applications in radio astronomy, we also acknowledge certain limitations that accompany its usage, suggesting directions for further refinement and research.
- [1609] arXiv:2402.10206 (cross-list from cs.LG) [ pdf , ps , other ]
-
Title: Ising on the Graph: Task-specific Graph Subsampling via the Ising ModelSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Abstract: Reducing a graph while preserving its overall structure is an important problem with many applications. Typically, the reduction approaches either remove edges (sparsification) or merge nodes (coarsening) in an unsupervised way with no specific downstream task in mind. In this paper, we present an approach for subsampling graph structures using an Ising model defined on either the nodes or edges and learning the external magnetic field of the Ising model using a graph neural network. Our approach is task-specific as it can learn how to reduce a graph for a specific downstream task in an end-to-end fashion. The utilized loss function of the task does not even have to be differentiable. We showcase the versatility of our approach on three distinct applications: image segmentation, 3D shape sparsification, and sparse approximate matrix inverse determination.
- [1610] arXiv:2402.10207 (cross-list from cs.LG) [ pdf , ps , html , other ]
-
Title: Rewards-in-Context: Multi-objective Alignment of Foundation Models with Dynamic Preference AdjustmentComments: 20 pages, 12 figures, 6 tablesSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
Abstract: We consider the problem of multi-objective alignment of foundation models with human preferences, which is a critical step towards helpful and harmless AI systems. However, it is generally costly and unstable to fine-tune large foundation models using reinforcement learning (RL), and the multi-dimensionality, heterogeneity, and conflicting nature of human preferences further complicate the alignment process. In this paper, we introduce Rewards-in-Context (RiC), which conditions the response of a foundation model on multiple rewards in its prompt context and applies supervised fine-tuning for alignment. The salient features of RiC are simplicity and adaptivity, as it only requires supervised fine-tuning of a single foundation model and supports dynamic adjustment for user preferences during inference time. Inspired by the analytical solution of an abstracted convex optimization problem, our dynamic inference-time adjustment method approaches the Pareto-optimal solution for multiple objectives. Empirical evidence demonstrates the efficacy of our method in aligning both Large Language Models (LLMs) and diffusion models to accommodate diverse rewards with only around 10% GPU hours compared with multi-objective RL baseline.
- [1611] arXiv:2402.10210 (cross-list from cs.LG) [ pdf , ps , other ]
-
Title: Self-Play Fine-Tuning of Diffusion Models for Text-to-Image GenerationComments: 28 pages, 8 figures, 10 tablesSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)
Abstract: Fine-tuning Diffusion Models remains an underexplored frontier in generative artificial intelligence (GenAI), especially when compared with the remarkable progress made in fine-tuning Large Language Models (LLMs). While cutting-edge diffusion models such as Stable Diffusion (SD) and SDXL rely on supervised fine-tuning, their performance inevitably plateaus after seeing a certain volume of data. Recently, reinforcement learning (RL) has been employed to fine-tune diffusion models with human preference data, but it requires at least two images ("winner" and "loser" images) for each text prompt. In this paper, we introduce an innovative technique called self-play fine-tuning for diffusion models (SPIN-Diffusion), where the diffusion model engages in competition with its earlier versions, facilitating an iterative self-improvement process. Our approach offers an alternative to conventional supervised fine-tuning and RL strategies, significantly improving both model performance and alignment. Our experiments on the Pick-a-Pic dataset reveal that SPIN-Diffusion outperforms the existing supervised fine-tuning method in aspects of human preference alignment and visual appeal right from its first iteration. By the second iteration, it exceeds the performance of RLHF-based methods across all metrics, achieving these results with less data.
- [1612] arXiv:2402.10213 (cross-list from q-bio.NC) [ pdf , ps , html , other ]
-
Title: Clustering Inductive Biases with Unrolled NetworksSubjects: Neurons and Cognition (q-bio.NC) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: The classical sparse coding (SC) model represents visual stimuli as a linear combination of a handful of learned basis functions that are Gabor-like when trained on natural image data. However, the Gabor-like filters learned by classical sparse coding far overpredict well-tuned simple cell receptive field profiles observed empirically. While neurons fire sparsely, neuronal populations are also organized in physical space by their sensitivity to certain features. In V1, this organization is a smooth progression of orientations along the cortical sheet. A number of subsequent models have either discarded the sparse dictionary learning framework entirely or whose updates have yet to take advantage of the surge in unrolled, neural dictionary learning architectures. A key missing theme of these updates is a stronger notion of \emph{structured sparsity}. We propose an autoencoder architecture (WLSC) whose latent representations are implicitly, locally organized for spectral clustering through a Laplacian quadratic form of a bipartite graph, which generates a diverse set of artificial receptive fields that match primate data in V1 as faithfully as recent contrastive frameworks like Local Low Dimensionality, or LLD \citep{lld} that discard sparse dictionary learning. By unifying sparse and smooth coding in models of the early visual cortex through our autoencoder, we also show that our regularization can be interpreted as early-stage specialization of receptive fields to certain classes of stimuli; that is, we induce a weak clustering bias for later stages of cortex where functional and spatial segregation (i.e. topography) are known to occur. The results show an imperative for \emph{spatial regularization} of both the receptive fields and firing rates to begin to describe feature disentanglement in V1 and beyond.
- [1613] arXiv:2402.10222 (cross-list from cs.RO) [ pdf , ps , html , other ]
-
Title: Autonomous Vehicle Patrolling Through Deep Reinforcement Learning: Learning to Communicate and CooperateSubjects: Robotics (cs.RO) ; Artificial Intelligence (cs.AI); Multiagent Systems (cs.MA)
Abstract: Autonomous vehicles are suited for continuous area patrolling problems. Finding an optimal patrolling strategy can be challenging due to unknown environmental factors, such as wind or landscape; or autonomous vehicles' constraints, such as limited battery life or hardware failures. Importantly, patrolling large areas often requires multiple agents to collectively coordinate their actions. However, an optimal coordination strategy is often non-trivial to be manually defined due to the complex nature of patrolling environments. In this paper, we consider a patrolling problem with environmental factors, agent limitations, and three typical cooperation problems -- collision avoidance, congestion avoidance, and patrolling target negotiation. We propose a multi-agent reinforcement learning solution based on a reinforced inter-agent learning (RIAL) method. With this approach, agents are trained to develop their own communication protocol to cooperate during patrolling where faults can and do occur. The solution is validated through simulation experiments and is compared with several state-of-the-art patrolling solutions from different perspectives, including the overall patrol performance, the collision avoidance performance, the efficiency of battery recharging strategies, and the overall fault tolerance.
- [1614] arXiv:2402.10224 (cross-list from cs.RO) [ pdf , ps , html , other ]
-
Title: Human-Centric Goal Reasoning with Ripple-Down RulesComments: Proceedings of the Ninth Goal Reasoning Workshop (Advances in Cognitive Systems, 2021)Subjects: Robotics (cs.RO) ; Artificial Intelligence (cs.AI); Multiagent Systems (cs.MA)
Abstract: ActorSim is a goal reasoning framework developed at the Naval Research Laboratory. Originally, all goal reasoning rules were hand-crafted. This work extends ActorSim with the capability of learning by demonstration, that is, when a human trainer disagrees with a decision made by the system, the trainer can take over and show the system the correct decision. The learning component uses Ripple-Down Rules (RDR) to build new decision rules to correctly handle similar cases in the future. The system is demonstrated using the RoboCup Rescue Agent Simulation, which simulates a city-wide disaster, requiring emergency services, including fire, ambulance and police, to be dispatched to different sites to evacuate civilians from dangerous situations. The RDRs are implemented in a scripting language, FrameScript, which is used to mediate between ActorSim and the agent simulator. Using Ripple-Down Rules, ActorSim can scale to an order of magnitude more goals than the previous version.
- [1615] arXiv:2402.10228 (cross-list from cs.LG) [ pdf , ps , html , other ]
-
Title: HyperAgent: A Simple, Scalable, Efficient and Provable Reinforcement Learning Framework for Complex EnvironmentsComments: Bridging the theory and practice! Invited talk in Informs Optimization Conference 2024 and International Symposium on Mathematical Programming 2024!Subjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
Abstract: To solve complex tasks under resource constraints, reinforcement learning (RL) agents need to be simple, efficient, and scalable, addressing (1) large state spaces and (2) the continuous accumulation of interaction data. We propose HyperAgent, an RL framework featuring the hypermodel and index sampling schemes that enable computation-efficient incremental approximation for the posteriors associated with general value functions without the need for conjugacy, and data-efficient action selection. Implementing HyperAgent is straightforward, requiring only one additional module beyond what is necessary for Double-DQN. HyperAgent stands out as the first method to offer robust performance in large-scale deep RL benchmarks while achieving provably scalable per-step computational complexity and attaining sublinear regret under tabular assumptions. HyperAgent can solve Deep Sea hard exploration problems with episodes that optimally scale with problem size and exhibits significant efficiency gains in both data and computation under the Atari benchmark. The core of our theoretical analysis is the sequential posterior approximation argument, enabled by the first analytical tool for sequential random projection -- a non-trivial martingale extension of the Johnson-Lindenstrauss. This work bridges the theoretical and practical realms of RL, establishing a new benchmark for RL algorithm design.
- [1616] arXiv:2402.10236 (cross-list from cs.MA) [ pdf , ps , html , other ]
-
Title: Discovering Sensorimotor Agency in Cellular Automata using Diversity SearchGautier Hamon , Mayalen Etcheverry , Bert Wang-Chak Chan , Clément Moulin-Frier , Pierre-Yves OudeyerSubjects: Multiagent Systems (cs.MA) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: The research field of Artificial Life studies how life-like phenomena such as autopoiesis, agency, or self-regulation can self-organize in computer simulations. In cellular automata (CA), a key open-question has been whether it it is possible to find environment rules that self-organize robust "individuals" from an initial state with no prior existence of things like "bodies", "brain", "perception" or "action". In this paper, we leverage recent advances in machine learning, combining algorithms for diversity search, curriculum learning and gradient descent, to automate the search of such "individuals", i.e. localized structures that move around with the ability to react in a coherent manner to external obstacles and maintain their integrity, hence primitive forms of sensorimotor agency. We show that this approach enables to find systematically environmental conditions in CA leading to self-organization of such basic forms of agency. Through multiple experiments, we show that the discovered agents have surprisingly robust capabilities to move, maintain their body integrity and navigate among various obstacles. They also show strong generalization abilities, with robustness to changes of scale, random updates or perturbations from the environment not seen during training. We discuss how this approach opens new perspectives in AI and synthetic bioengineering.
- [1617] arXiv:2402.10240 (cross-list from cs.LG) [ pdf , ps , html , other ]
-
Title: A Dynamical View of the Question of WhyComments: Accepted at the Twelfth International Conference on Learning Representations (ICLR'24)Subjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Systems and Control (eess.SY)
Abstract: We address causal reasoning in multivariate time series data generated by stochastic processes. Existing approaches are largely restricted to static settings, ignoring the continuity and emission of variations across time. In contrast, we propose a learning paradigm that directly establishes causation between events in the course of time. We present two key lemmas to compute causal contributions and frame them as reinforcement learning problems. Our approach offers formal and computational tools for uncovering and quantifying causal relationships in diffusion processes, subsuming various important settings such as discrete-time Markov decision processes. Finally, in fairly intricate experiments and through sheer learning, our framework reveals and quantifies causal links, which otherwise seem inexplicable.
- [1618] arXiv:2402.10248 (cross-list from cs.LG) [ pdf , ps , html , other ]
-
Title: A Data-Driven Supervised Machine Learning Approach to Estimating Global Ambient Air Pollution Concentrations With Associated Prediction IntervalsComments: Main Paper: 25 pages, 15 figures, 5 tables. Supplementary: 4 pages, 3 figuresSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Abstract: Global ambient air pollution, a transboundary challenge, is typically addressed through interventions relying on data from spatially sparse and heterogeneously placed monitoring stations. These stations often encounter temporal data gaps due to issues such as power outages. In response, we have developed a scalable, data-driven, supervised machine learning framework. This model is designed to impute missing temporal and spatial measurements, thereby generating a comprehensive dataset for pollutants including NO$_2$, O$_3$, PM$_{10}$, PM$_{2.5}$, and SO$_2$. The dataset, with a fine granularity of 0.25$^{\circ}$ at hourly intervals and accompanied by prediction intervals for each estimate, caters to a wide range of stakeholders relying on outdoor air pollution data for downstream assessments. This enables more detailed studies. Additionally, the model's performance across various geographical locations is examined, providing insights and recommendations for strategic placement of future monitoring stations to further enhance the model's accuracy.
- [1619] arXiv:2402.10251 (cross-list from q-bio.NC) [ pdf , ps , html , other ]
-
Title: Brant-2: Foundation Model for Brain SignalsComments: 14 pages, 7 figuresSubjects: Neurons and Cognition (q-bio.NC) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Signal Processing (eess.SP)
Abstract: Foundational models benefit from pre-training on large amounts of unlabeled data and enable strong performance in a wide variety of applications with a small amount of labeled data. Such models can be particularly effective in analyzing brain signals, as this field encompasses numerous application scenarios, and it is costly to perform large-scale annotation. In this work, we present the largest foundation model in brain signals, Brant-2. Compared to Brant, a foundation model designed for intracranial neural signals, Brant-2 not only exhibits robustness towards data variations and modeling scales but also can be applied to a broader range of brain neural data. By experimenting on an extensive range of tasks, we demonstrate that Brant-2 is adaptive to various application scenarios in brain signals. Further analyses reveal the scalability of the Brant-2, validate each component's effectiveness, and showcase our model's ability to maintain performance in scenarios with scarce labels.
- [1620] arXiv:2402.10283 (cross-list from cs.LG) [ pdf , ps , html , other ]
-
Title: Backdoor Attack against One-Class Sequential Anomaly Detection ModelsComments: This work is accepted by the PAKDD 2024. 12 pagesSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Cryptography and Security (cs.CR); Information Theory (cs.IT)
Abstract: Deep anomaly detection on sequential data has garnered significant attention due to the wide application scenarios. However, deep learning-based models face a critical security threat - their vulnerability to backdoor attacks. In this paper, we explore compromising deep sequential anomaly detection models by proposing a novel backdoor attack strategy. The attack approach comprises two primary steps, trigger generation and backdoor injection. Trigger generation is to derive imperceptible triggers by crafting perturbed samples from the benign normal data, of which the perturbed samples are still normal. The backdoor injection is to properly inject the backdoor triggers to comprise the model only for the samples with triggers. The experimental results demonstrate the effectiveness of our proposed attack strategy by injecting backdoors on two well-established one-class anomaly detection models.
- [1621] arXiv:2402.10294 (cross-list from cs.HC) [ pdf , ps , html , other ]
-
Title: LAVE: LLM-Powered Agent Assistance and Language Augmentation for Video EditingComments: Paper accepted to the ACM Conference on Intelligent User Interfaces (ACM IUI) 2024Subjects: Human-Computer Interaction (cs.HC) ; Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Multimedia (cs.MM)
Abstract: Video creation has become increasingly popular, yet the expertise and effort required for editing often pose barriers to beginners. In this paper, we explore the integration of large language models (LLMs) into the video editing workflow to reduce these barriers. Our design vision is embodied in LAVE, a novel system that provides LLM-powered agent assistance and language-augmented editing features. LAVE automatically generates language descriptions for the user's footage, serving as the foundation for enabling the LLM to process videos and assist in editing tasks. When the user provides editing objectives, the agent plans and executes relevant actions to fulfill them. Moreover, LAVE allows users to edit videos through either the agent or direct UI manipulation, providing flexibility and enabling manual refinement of agent actions. Our user study, which included eight participants ranging from novices to proficient editors, demonstrated LAVE's effectiveness. The results also shed light on user perceptions of the proposed LLM-assisted editing paradigm and its impact on users' creativity and sense of co-creation. Based on these findings, we propose design implications to inform the future development of agent-assisted content editing.
- [1622] arXiv:2402.10334 (cross-list from cs.CV) [ pdf , ps , html , other ]
-
Title: HI-GAN: Hierarchical Inpainting GAN with Auxiliary Inputs for Combined RGB and Depth InpaintingSubjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: Inpainting involves filling in missing pixels or areas in an image, a crucial technique employed in Mixed Reality environments for various applications, particularly in Diminished Reality (DR) where content is removed from a user's visual environment. Existing methods rely on digital replacement techniques which necessitate multiple cameras and incur high costs. AR devices and smartphones use ToF depth sensors to capture scene depth maps aligned with RGB images. Despite speed and affordability, ToF cameras create imperfect depth maps with missing pixels. To address the above challenges, we propose Hierarchical Inpainting GAN (HI-GAN), a novel approach comprising three GANs in a hierarchical fashion for RGBD inpainting. EdgeGAN and LabelGAN inpaint masked edge and segmentation label images respectively, while CombinedRGBD-GAN combines their latent representation outputs and performs RGB and Depth inpainting. Edge images and particularly segmentation label images as auxiliary inputs significantly enhance inpainting performance by complementary context and hierarchical optimization. We believe we make the first attempt to incorporate label images into inpainting process.Unlike previous approaches requiring multiple sequential models and separate outputs, our work operates in an end-to-end manner, training all three models simultaneously and hierarchically. Specifically, EdgeGAN and LabelGAN are first optimized separately and further optimized inside CombinedRGBD-GAN to enhance inpainting quality. Experiments demonstrate that HI-GAN works seamlessly and achieves overall superior performance compared with existing approaches.
- [1623] arXiv:2402.10340 (cross-list from cs.RO) [ pdf , ps , other ]
-
Title: On the Safety Concerns of Deploying LLMs/VLMs in Robotics: Highlighting the Risks and VulnerabilitiesXiyang Wu , Ruiqi Xian , Tianrui Guan , Jing Liang , Souradip Chakraborty , Fuxiao Liu , Brian Sadler , Dinesh Manocha , Amrit Singh BediSubjects: Robotics (cs.RO) ; Artificial Intelligence (cs.AI)
Abstract: In this paper, we highlight the critical issues of robustness and safety associated with integrating large language models (LLMs) and vision-language models (VLMs) into robotics applications. Recent works have focused on using LLMs and VLMs to improve the performance of robotics tasks, such as manipulation, navigation, etc. However, such integration can introduce significant vulnerabilities, in terms of their susceptibility to adversarial attacks due to the language models, potentially leading to catastrophic consequences. By examining recent works at the interface of LLMs/VLMs and robotics, we show that it is easy to manipulate or misguide the robot's actions, leading to safety hazards. We define and provide examples of several plausible adversarial attacks, and conduct experiments on three prominent robot frameworks integrated with a language model, including KnowNo VIMA, and Instruct2Act, to assess their susceptibility to these attacks. Our empirical findings reveal a striking vulnerability of LLM/VLM-robot integrated systems: simple adversarial attacks can significantly undermine the effectiveness of LLM/VLM-robot integrated systems. Specifically, our data demonstrate an average performance deterioration of 21.2% under prompt attacks and a more alarming 30.2% under perception attacks. These results underscore the critical need for robust countermeasures to ensure the safe and reliable deployment of the advanced LLM/VLM-based robotic systems.
- [1624] arXiv:2402.10342 (cross-list from cs.LG) [ pdf , ps , other ]
-
Title: Exploration-Driven Policy Optimization in RLHF: Theoretical Insights on Efficient Data UtilizationSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
Abstract: Reinforcement Learning from Human Feedback (RLHF) has achieved impressive empirical successes while relying on a small amount of human feedback. However, there is limited theoretical justification for this phenomenon. Additionally, most recent studies focus on value-based algorithms despite the recent empirical successes of policy-based algorithms. In this work, we consider an RLHF algorithm based on policy optimization (PO-RLHF). The algorithm is based on the popular Policy Cover-Policy Gradient (PC-PG) algorithm, which assumes knowledge of the reward function. In PO-RLHF, knowledge of the reward function is not assumed and the algorithm relies on trajectory-based comparison feedback to infer the reward function. We provide performance bounds for PO-RLHF with low query complexity, which provides insight into why a small amount of human feedback may be sufficient to get good performance with RLHF. A key novelty is our trajectory-level elliptical potential analysis technique used to infer reward function parameters when comparison queries rather than reward observations are used. We provide and analyze algorithms in two settings: linear and neural function approximation, PG-RLHF and NN-PG-RLHF, respectively.
- [1625] arXiv:2402.10350 (cross-list from cs.LG) [ pdf , ps , html , other ]
-
Title: Large Language Models for Forecasting and Anomaly Detection: A Systematic Literature ReviewJing Su , Chufeng Jiang , Xin Jin , Yuxin Qiao , Tingsong Xiao , Hongda Ma , Rong Wei , Zhi Jing , Jiajun Xu , Junhong LinSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Abstract: This systematic literature review comprehensively examines the application of Large Language Models (LLMs) in forecasting and anomaly detection, highlighting the current state of research, inherent challenges, and prospective future directions. LLMs have demonstrated significant potential in parsing and analyzing extensive datasets to identify patterns, predict future events, and detect anomalous behavior across various domains. However, this review identifies several critical challenges that impede their broader adoption and effectiveness, including the reliance on vast historical datasets, issues with generalizability across different contexts, the phenomenon of model hallucinations, limitations within the models' knowledge boundaries, and the substantial computational resources required. Through detailed analysis, this review discusses potential solutions and strategies to overcome these obstacles, such as integrating multimodal data, advancements in learning methodologies, and emphasizing model explainability and computational efficiency. Moreover, this review outlines critical trends that are likely to shape the evolution of LLMs in these fields, including the push toward real-time processing, the importance of sustainable modeling practices, and the value of interdisciplinary collaboration. Conclusively, this review underscores the transformative impact LLMs could have on forecasting and anomaly detection while emphasizing the need for continuous innovation, ethical considerations, and practical solutions to realize their full potential.
- [1626] arXiv:2402.10373 (cross-list from cs.CL) [ pdf , ps , html , other ]
-
Title: BioMistral: A Collection of Open-Source Pretrained Large Language Models for Medical DomainsYanis Labrak , Adrien Bazoge , Emmanuel Morin , Pierre-Antoine Gourraud , Mickael Rouvier , Richard DufourSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: Large Language Models (LLMs) have demonstrated remarkable versatility in recent years, offering potential applications across specialized domains such as healthcare and medicine. Despite the availability of various open-source LLMs tailored for health contexts, adapting general-purpose LLMs to the medical domain presents significant challenges. In this paper, we introduce BioMistral, an open-source LLM tailored for the biomedical domain, utilizing Mistral as its foundation model and further pre-trained on PubMed Central. We conduct a comprehensive evaluation of BioMistral on a benchmark comprising 10 established medical question-answering (QA) tasks in English. We also explore lightweight models obtained through quantization and model merging approaches. Our results demonstrate BioMistral's superior performance compared to existing open-source medical models and its competitive edge against proprietary counterparts. Finally, to address the limited availability of data beyond English and to assess the multilingual generalization of medical LLMs, we automatically translated and evaluated this benchmark into 7 other languages. This marks the first large-scale multilingual evaluation of LLMs in the medical domain. Datasets, multilingual evaluation benchmarks, scripts, and all the models obtained during our experiments are freely released.
- [1627] arXiv:2402.10380 (cross-list from cs.LG) [ pdf , ps , html , other ]
-
Title: Subgraph-level Universal Prompt TuningSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
Abstract: In the evolving landscape of machine learning, the adaptation of pre-trained models through prompt tuning has become increasingly prominent. This trend is particularly observable in the graph domain, where diverse pre-training strategies present unique challenges in developing effective prompt-based tuning methods for graph neural networks. Previous approaches have been limited, focusing on specialized prompting functions tailored to models with edge prediction pre-training tasks. These methods, however, suffer from a lack of generalizability across different pre-training strategies. Recently, a simple prompt tuning method has been designed for any pre-training strategy, functioning within the input graph's feature space. This allows it to theoretically emulate any type of prompting function, thereby significantly increasing its versatility for a range of downstream applications. Nevertheless, the capacity of such simple prompts to fully grasp the complex contexts found in graphs remains an open question, necessitating further investigation. Addressing this challenge, our work introduces the Subgraph-level Universal Prompt Tuning (SUPT) approach, focusing on the detailed context within subgraphs. In SUPT, prompt features are assigned at the subgraph-level, preserving the method's universal capability. This requires extremely fewer tuning parameters than fine-tuning-based methods, outperforming them in 42 out of 45 full-shot scenario experiments with an average improvement of over 2.5%. In few-shot scenarios, it excels in 41 out of 45 experiments, achieving an average performance increase of more than 6.6%.
- [1628] arXiv:2402.10381 (cross-list from cs.IR) [ pdf , ps , html , other ]
-
Title: UMAIR-FPS: User-aware Multi-modal Animation Illustration Recommendation Fusion with Painting StyleComments: Accepted by DASFAA 2024 Research trackSubjects: Information Retrieval (cs.IR) ; Artificial Intelligence (cs.AI)
Abstract: The rapid advancement of high-quality image generation models based on AI has generated a deluge of anime illustrations. Recommending illustrations to users within massive data has become a challenging and popular task. However, existing anime recommendation systems have focused on text features but still need to integrate image features. In addition, most multi-modal recommendation research is constrained by tightly coupled datasets, limiting its applicability to anime illustrations. We propose the User-aware Multi-modal Animation Illustration Recommendation Fusion with Painting Style (UMAIR-FPS) to tackle these gaps. In the feature extract phase, for image features, we are the first to combine image painting style features with semantic features to construct a dual-output image encoder for enhancing representation. For text features, we obtain text embeddings based on fine-tuning Sentence-Transformers by incorporating domain knowledge that composes a variety of domain text pairs from multilingual mappings, entity relationships, and term explanation perspectives, respectively. In the multi-modal fusion phase, we novelly propose a user-aware multi-modal contribution measurement mechanism to weight multi-modal features dynamically according to user features at the interaction level and employ the DCN-V2 module to model bounded-degree multi-modal crosses effectively. UMAIR-FPS surpasses the stat-of-the-art baselines on large real-world datasets, demonstrating substantial performance enhancements.
- [1629] arXiv:2402.10392 (cross-list from cs.LG) [ pdf , ps , html , other ]
-
Title: Pretext Training Algorithms for Event Sequence DataSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Abstract: Pretext training followed by task-specific fine-tuning has been a successful approach in vision and language domains. This paper proposes a self-supervised pretext training framework tailored to event sequence data. We introduce a novel alignment verification task that is specialized to event sequences, building on good practices in masked reconstruction and contrastive learning. Our pretext tasks unlock foundational representations that are generalizable across different down-stream tasks, including next-event prediction for temporal point process models, event sequence classification, and missing event interpolation. Experiments on popular public benchmarks demonstrate the potential of the proposed method across different tasks and data domains.
- [1630] arXiv:2402.10393 (cross-list from cs.GL) [ pdf , ps , other ]
-
Title: Darwin Turing Dawkins: Building a General Theory of EvolutionComments: 247 pagesSubjects: General Literature (cs.GL) ; Artificial Intelligence (cs.AI); Populations and Evolution (q-bio.PE)
Abstract: Living things, computers, societies, and even books are part of a grand evolutionary struggle to survive. That struggle shapes nature, nations, religions, art, science, and you. What you think, feel, and do is determined by it. Darwinian evolution does not apply solely to the genes that are stored in DNA. Using the insights of Alan Turing and Richard Dawkins, we will see that it also applies to the memes we store in our brains and the information we store in our computers. The next time you run for president, fight a war, or just deal with the ordinary problems humans are heir to, perhaps this book will be of use. If you want to understand why and when you will die, or if you want to achieve greatness this book may help. If you are concerned about where the computer revolution is headed, this book may provide some answers.
- [1631] arXiv:2402.10403 (cross-list from cs.LG) [ pdf , ps , html , other ]
-
Title: Polyhedral Complex Derivation from Piecewise Trilinear NetworksSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR)
Abstract: Recent advancements in visualizing deep neural networks provide insights into their structures and mesh extraction from Continuous Piecewise Affine (CPWA) functions. Meanwhile, developments in neural surface representation learning incorporate non-linear positional encoding, addressing issues like spectral bias; however, this poses challenges in applying mesh extraction techniques based on CPWA functions. Focusing on trilinear interpolating methods as positional encoding, we present theoretical insights and an analytical mesh extraction, showing the transformation of hypersurfaces to flat planes within the trilinear region under the eikonal constraint. Moreover, we introduce a method for approximating intersecting points among three hypersurfaces contributing to broader applications. We empirically validate correctness and parsimony through chamfer distance and efficiency, and angular distance, while examining the correlation between the eikonal loss and the planarity of the hypersurfaces.
- [1632] arXiv:2402.10404 (cross-list from cs.CV) [ pdf , ps , html , other ]
-
Title: Explaining generative diffusion models via visual analysis for interpretable decision-making processComments: 22 pages, published in Expert Systems with ApplicationsJournal-ref: Expert Systems with Applications 248 (2024) 123231Subjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI)
Abstract: Diffusion models have demonstrated remarkable performance in generation tasks. Nevertheless, explaining the diffusion process remains challenging due to it being a sequence of denoising noisy images that are difficult for experts to interpret. To address this issue, we propose the three research questions to interpret the diffusion process from the perspective of the visual concepts generated by the model and the region where the model attends in each time step. We devise tools for visualizing the diffusion process and answering the aforementioned research questions to render the diffusion process human-understandable. We show how the output is progressively generated in the diffusion process by explaining the level of denoising and highlighting relationships to foundational visual concepts at each time step through the results of experiments with various visual analyses using the tools. Throughout the training of the diffusion model, the model learns diverse visual concepts corresponding to each time-step, enabling the model to predict varying levels of visual concepts at different stages. We substantiate our tools using Area Under Cover (AUC) score, correlation quantification, and cross-attention mapping. Our findings provide insights into the diffusion process and pave the way for further research into explainable diffusion mechanisms.
- [1633] arXiv:2402.10409 (cross-list from cs.CL) [ pdf , ps , html , other ]
-
Title: Understanding Survey Paper Taxonomy about Large Language Models via Graph Representation LearningComments: TL;DR: We collected metadata about LLM surveys and developed a method for categorizing them into a taxonomy, indicating the superiority of graph representation learning over language models and revealing the efficacy of fine-tuning using weak labelsSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI); Information Retrieval (cs.IR); Machine Learning (cs.LG)
Abstract: As new research on Large Language Models (LLMs) continues, it is difficult to keep up with new research and models. To help researchers synthesize the new research many have written survey papers, but even those have become numerous. In this paper, we develop a method to automatically assign survey papers to a taxonomy. We collect the metadata of 144 LLM survey papers and explore three paradigms to classify papers within the taxonomy. Our work indicates that leveraging graph structure information on co-category graphs can significantly outperform the language models in two paradigms; pre-trained language models' fine-tuning and zero-shot/few-shot classifications using LLMs. We find that our model surpasses an average human recognition level and that fine-tuning LLMs using weak labels generated by a smaller model, such as the GCN in this study, can be more effective than using ground-truth labels, revealing the potential of weak-to-strong generalization in the taxonomy classification task.
- [1634] arXiv:2402.10412 (cross-list from cs.CL) [ pdf , ps , html , other ]
-
Title: Measuring and Reducing LLM Hallucination without Gold-Standard Answers via Expertise-WeightingComments: Paper Under ReviewSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: LLM hallucination, i.e. generating factually incorrect yet seemingly convincing answers, is currently a major threat to the trustworthiness and reliability of LLMs. The first step towards solving this complicated problem is to measure it. However, existing hallucination metrics require to have a benchmark dataset with gold-standard answers, i.e. "best" or "correct" answers written by humans. Such requirement makes hallucination measurement costly and prone to human errors. In this work, we propose Factualness Evaluations via Weighting LLMs (FEWL), the first hallucination metric that is specifically designed for the scenario when gold-standard answers are absent. FEWL leverages the answers from off-the-shelf LLMs that serve as a proxy of gold-standard answers. The key challenge is how to quantify the expertise of reference LLMs resourcefully. We show FEWL has certain theoretical guarantees and demonstrate empirically it gives more accurate hallucination measures than naively using reference LLMs. We also show how to leverage FEWL to reduce hallucination through both in-context learning and supervised finetuning. Last, we build a large-scale benchmark dataset to facilitate LLM hallucination research.
- [1635] arXiv:2402.10423 (cross-list from cs.CR) [ pdf , ps , other ]
-
Title: Connect the dots: Dataset Condensation, Differential Privacy, and Adversarial UncertaintySubjects: Cryptography and Security (cs.CR) ; Artificial Intelligence (cs.AI)
Abstract: Our work focuses on understanding the underpinning mechanism of dataset condensation by drawing connections with ($\epsilon$, $\delta$)-differential privacy where the optimal noise, $\epsilon$, is chosen by adversarial uncertainty \cite{Grining2017}. We can answer the question about the inner workings of the dataset condensation procedure. Previous work \cite{dong2022} proved the link between dataset condensation (DC) and ($\epsilon$, $\delta$)-differential privacy. However, it is unclear from existing works on ablating DC to obtain a lower-bound estimate of $\epsilon$ that will suffice for creating high-fidelity synthetic data. We suggest that adversarial uncertainty is the most appropriate method to achieve an optimal noise level, $\epsilon$. As part of the internal dynamics of dataset condensation, we adopt a satisfactory scheme for noise estimation that guarantees high-fidelity data while providing privacy.
- [1636] arXiv:2402.10424 (cross-list from cs.CL) [ pdf , ps , html , other ]
-
Title: Understanding In-Context Learning with a Pelican Soup FrameworkSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI)
Abstract: Many existing theoretical analyses of in-context learning for natural language processing are based on latent variable models that leaves gaps between theory and practice. We aim to close these gaps by proposing a theoretical framework, the Pelican Soup Framework. In this framework, we introduce (1) the notion of a common sense knowledge base, (2) a general formalism for natural language classification tasks, and the notion of (3) meaning association. Under this framework, we can establish a $\mathcal{O}(1/T)$ loss bound for in-context learning, where $T$ is the number of example-label pairs in the demonstration. Compared with previous works, our bound reflects the effect of the choice of verbalizers and the effect of instruction tuning. An additional notion of \textit{atom concepts} makes our framework possible to explain the generalization to tasks unseen in the language model training data. Finally, we propose a toy setup, Calcutec, and a digit addition task that mimics types of distribution shifts a model needs to overcome to perform in-context learning. We also experiment with GPT2-Large on real-world NLP tasks. Our empirical results demonstrate the efficacy of our framework to explain in-context learning.
- [1637] arXiv:2402.10427 (cross-list from cs.CL) [ pdf , ps , html , other ]
-
Title: Evaluating and Improving Continual Learning in Spoken Language UnderstandingSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI); Sound (cs.SD); Audio and Speech Processing (eess.AS)
Abstract: Continual learning has emerged as an increasingly important challenge across various tasks, including Spoken Language Understanding (SLU). In SLU, its objective is to effectively handle the emergence of new concepts and evolving environments. The evaluation of continual learning algorithms typically involves assessing the model's stability, plasticity, and generalizability as fundamental aspects of standards. However, existing continual learning metrics primarily focus on only one or two of the properties. They neglect the overall performance across all tasks, and do not adequately disentangle the plasticity versus stability/generalizability trade-offs within the model. In this work, we propose an evaluation methodology that provides a unified evaluation on stability, plasticity, and generalizability in continual learning. By employing the proposed metric, we demonstrate how introducing various knowledge distillations can improve different aspects of these three properties of the SLU model. We further show that our proposed metric is more sensitive in capturing the impact of task ordering in continual learning, making it better suited for practical use-case scenarios.
- [1638] arXiv:2402.10466 (cross-list from cs.CL) [ pdf , ps , other ]
-
Title: Large Language Models as Zero-shot Dialogue State Tracker through Function CallingZekun Li , Zhiyu Zoey Chen , Mike Ross , Patrick Huber , Seungwhan Moon , Zhaojiang Lin , Xin Luna Dong , Adithya Sagar , Xifeng Yan , Paul A. CrookComments: More results in the next version. Code available at: this https URLSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI)
Abstract: Large language models (LLMs) are increasingly prevalent in conversational systems due to their advanced understanding and generative capabilities in general contexts. However, their effectiveness in task-oriented dialogues (TOD), which requires not only response generation but also effective dialogue state tracking (DST) within specific tasks and domains, remains less satisfying. In this work, we propose a novel approach FnCTOD for solving DST with LLMs through function calling. This method improves zero-shot DST, allowing adaptation to diverse domains without extensive data collection or model tuning. Our experimental results demonstrate that our approach achieves exceptional performance with both modestly sized open-source and also proprietary LLMs: with in-context prompting it enables various 7B or 13B parameter models to surpass the previous state-of-the-art (SOTA) achieved by ChatGPT, and improves ChatGPT's performance beating the SOTA by 5.6% average joint goal accuracy (JGA). Individual model results for GPT-3.5 and GPT-4 are boosted by 4.8% and 14%, respectively. We also show that by fine-tuning on a small collection of diverse task-oriented dialogues, we can equip modestly sized models, specifically a 13B parameter LLaMA2-Chat model, with function-calling capabilities and DST performance comparable to ChatGPT while maintaining their chat capabilities. We have made the code publicly available at this https URL
- [1639] arXiv:2402.10468 (cross-list from cs.LG) [ pdf , ps , html , other ]
-
Title: Adversarial Curriculum Graph Contrastive Learning with Pair-wise AugmentationSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Abstract: Graph contrastive learning (GCL) has emerged as a pivotal technique in the domain of graph representation learning. A crucial aspect of effective GCL is the caliber of generated positive and negative samples, which is intrinsically dictated by their resemblance to the original data. Nevertheless, precise control over similarity during sample generation presents a formidable challenge, often impeding the effective discovery of representative graph patterns. To address this challenge, we propose an innovative framework: Adversarial Curriculum Graph Contrastive Learning (ACGCL), which capitalizes on the merits of pair-wise augmentation to engender graph-level positive and negative samples with controllable similarity, alongside subgraph contrastive learning to discern effective graph patterns therein. Within the ACGCL framework, we have devised a novel adversarial curriculum training methodology that facilitates progressive learning by sequentially increasing the difficulty of distinguishing the generated samples. Notably, this approach transcends the prevalent sparsity issue inherent in conventional curriculum learning strategies by adaptively concentrating on more challenging training data. Finally, a comprehensive assessment of ACGCL is conducted through extensive experiments on six well-known benchmark datasets, wherein ACGCL conspicuously surpasses a set of state-of-the-art baselines.
- [1640] arXiv:2402.10481 (cross-list from q-fin.CP) [ pdf , ps , other ]
-
Title: Emoji Driven Crypto Assets Market ReactionsSubjects: Computational Finance (q-fin.CP) ; Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG); Statistical Finance (q-fin.ST)
Abstract: In the burgeoning realm of cryptocurrency, social media platforms like Twitter have become pivotal in influencing market trends and investor sentiments. In our study, we leverage GPT-4 and a fine-tuned transformer-based BERT model for a multimodal sentiment analysis, focusing on the impact of emoji sentiment on cryptocurrency markets. By translating emojis into quantifiable sentiment data, we correlate these insights with key market indicators like BTC Price and the VCRIX index. Our architecture's analysis of emoji sentiment demonstrated a distinct advantage over FinBERT's pure text sentiment analysis in such predicting power. This approach may be fed into the development of trading strategies aimed at utilizing social media elements to identify and forecast market trends. Crucially, our findings suggest that strategies based on emoji sentiment can facilitate the avoidance of significant market downturns and contribute to the stabilization of returns. This research underscores the practical benefits of integrating advanced AI-driven analyses into financial strategies, offering a nuanced perspective on the interplay between digital communication and market dynamics in an academic context.
- [1641] arXiv:2402.10487 (cross-list from cs.LG) [ pdf , ps , html , other ]
-
Title: Random Projection Layers for Multidimensional Time Series ForecastingChin-Chia Michael Yeh , Yujie Fan , Xin Dai , Vivian Lai , Prince Osei Aboagye , Junpeng Wang , Huiyuan Chen , Yan Zheng , Zhongfang Zhuang , Liang Wang , Wei ZhangSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Abstract: All-Multi-Layer Perceptron (all-MLP) mixer models have been shown to be effective for time series forecasting problems. However, when such a model is applied to high-dimensional time series (e.g., the time series in a spatial-temporal dataset), its performance is likely to degrade due to overfitting issues. In this paper, we propose an all-MLP time series forecasting architecture, referred to as RPMixer. Our method leverages the ensemble-like behavior of deep neural networks, where each individual block within the network acts like a base learner in an ensemble model, especially when identity mapping residual connections are incorporated. By integrating random projection layers into our model, we increase the diversity among the blocks' outputs, thereby enhancing the overall performance of RPMixer. Extensive experiments conducted on large-scale spatial-temporal forecasting benchmark datasets demonstrate that our proposed method outperforms alternative methods, including both spatial-temporal graph models and general forecasting models.
- [1642] arXiv:2402.10492 (cross-list from cs.LG) [ pdf , ps , other ]
-
Title: Developing an Optimal Model for Predicting the Severity of Wheat Stem Rust (Case study of Arsi and Bale Zone)Subjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Abstract: This research utilized three types of artificial neural network (ANN) methodologies, namely Backpropagation Neural Network (BPNN) with varied training, transfer, divide, and learning functions; Radial Basis Function Neural Network (RBFNN); and General Regression Neural Network (GRNN), to forecast the severity of stem rust. It considered parameters such as mean maximum temperature, mean minimum temperature, mean rainfall, mean average temperature, mean relative humidity, and different wheat varieties. The statistical analysis revealed that GRNN demonstrated effective predictive capability and required less training time compared to the other models. Additionally, the results indicated that total seasonal rainfall positively influenced the development of wheat stem rust.
Keywords: Wheat stem rust, Back propagation neural network, Radial Basis Function Neural Network, General Regression Neural Network. - [1643] arXiv:2402.10496 (cross-list from cs.CL) [ pdf , ps , html , other ]
-
Title: Comparing Hallucination Detection Metrics for Multilingual GenerationSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI)
Abstract: While many automatic hallucination detection techniques have been proposed for English texts, their effectiveness in multilingual contexts remains unexplored. This paper aims to bridge the gap in understanding how these hallucination detection metrics perform on non-English languages. We evaluate the efficacy of various detection metrics, including lexical metrics like ROUGE and Named Entity Overlap and Natural Language Inference (NLI)-based metrics, at detecting hallucinations in biographical summaries in many languages; we also evaluate how correlated these different metrics are to gauge whether they measure the same phenomena. Our empirical analysis reveals that while lexical metrics show limited effectiveness, NLI-based metrics perform well in high-resource languages at the sentence level. In contrast, NLI-based metrics often fail to detect atomic fact hallucinations. Our findings highlight existing gaps in multilingual hallucination detection and motivate future research to develop more robust detection methods for LLM hallucination in other languages.
- [1644] arXiv:2402.10500 (cross-list from cs.LG) [ pdf , ps , other ]
-
Title: Provably Sample Efficient RLHF via Active Preference OptimizationSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
Abstract: Reinforcement Learning from Human Feedback (RLHF) is pivotal in aligning Large Language Models (LLMs) with human preferences. While these aligned generative models have demonstrated impressive capabilities across various tasks, the dependence on high-quality human preference data poses a costly bottleneck in practical implementation of RLHF. Hence better and adaptive strategies for data collection is needed. To this end, we frame RLHF as a contextual preference bandit problem with prompts as contexts and show that the naive way of collecting preference data by choosing prompts uniformly at random leads to a policy that suffers an $\Omega(1)$ suboptimality gap in rewards. Then we propose $\textit{Active Preference Optimization}$ ($\texttt{APO}$), an algorithm that actively selects prompts to collect preference data. Under the Bradley-Terry-Luce (BTL) preference model, \texttt{APO} achieves sample efficiency without compromising on policy performance. We show that given a sample budget of $T$, the suboptimality gap of a policy learned via $\texttt{APO}$ scales as $O(1/\sqrt{T})$. Next, we propose a compute-efficient batch version of $\texttt{APO}$ with minor modification and evaluate its performance in practice. Experimental evaluations on a human preference dataset validate \texttt{APO}'s efficacy as a sample-efficient and practical solution to data collection for RLHF, facilitating alignment of LLMs with human preferences in a cost-effective and scalable manner.
- [1645] arXiv:2402.10510 (cross-list from cs.HC) [ pdf , ps , html , other ]
-
Title: Human Goal Recognition as Bayesian Inference: Investigating the Impact of Actions, Timing, and Goal SolvabilityComments: Accepted by AAMAS 2024Subjects: Human-Computer Interaction (cs.HC) ; Artificial Intelligence (cs.AI)
Abstract: Goal recognition is a fundamental cognitive process that enables individuals to infer intentions based on available cues. Current goal recognition algorithms often take only observed actions as input, but here we use a Bayesian framework to explore the role of actions, timing, and goal solvability in goal recognition. We analyze human responses to goal-recognition problems in the Sokoban domain, and find that actions are assigned most importance, but that timing and solvability also influence goal recognition in some cases, especially when actions are uninformative. We leverage these findings to develop a goal recognition model that matches human inferences more closely than do existing algorithms. Our work provides new insight into human goal recognition and takes a step towards more human-like AI models.
- [1646] arXiv:2402.10511 (cross-list from cs.LG) [ pdf , ps , html , other ]
-
Title: Can Transformers Predict Vibrations?Subjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Signal Processing (eess.SP)
Abstract: Highly accurate time-series vibration prediction is an important research issue for electric vehicles (EVs). EVs often experience vibrations when driving on rough terrains, known as torsional resonance. This resonance, caused by the interaction between motor and tire vibrations, puts excessive loads on the vehicle's drive shaft. However, current damping technologies only detect resonance after the vibration amplitude of the drive shaft torque reaches a certain threshold, leading to significant loads on the shaft at the time of detection. In this study, we propose a novel approach to address this issue by introducing Resoformer, a transformer-based model for predicting torsional resonance. Resoformer utilizes time-series of the motor rotation speed as input and predicts the amplitude of torsional vibration at a specified quantile occurring in the shaft after the input series. By calculating the attention between recursive and convolutional features extracted from the measured data points, Resoformer improves the accuracy of vibration forecasting. To evaluate the model, we use a vibration dataset called VIBES (Dataset for Forecasting Vibration Transition in EVs), consisting of 2,600 simulator-generated vibration sequences. Our experiments, conducted on strong baselines built on the VIBES dataset, demonstrate that Resoformer achieves state-of-the-art results. In conclusion, our study answers the question "Can Transformers Forecast Vibrations?" While traditional transformer architectures show low performance in forecasting torsional resonance waves, our findings indicate that combining recurrent neural network and temporal convolutional network using the transformer architecture improves the accuracy of long-term vibration forecasting.
- [1647] arXiv:2402.10515 (cross-list from eess.SP) [ pdf , ps , html , other ]
-
Title: Power-Efficient Indoor Localization Using Adaptive Channel-aware Ultra-wideband DL-TDOAJournal-ref: IEEE GLOBECOM 2023Subjects: Signal Processing (eess.SP) ; Artificial Intelligence (cs.AI)
Abstract: Among the various Ultra-wideband (UWB) ranging methods, the absence of uplink communication or centralized computation makes downlink time-difference-of-arrival (DL-TDOA) localization the most suitable for large-scale industrial deployments. However, temporary or permanent obstacles in the deployment region often lead to non-line-of-sight (NLOS) channel path and signal outage effects, which result in localization errors. Prior research has addressed this problem by increasing the ranging frequency, which leads to a heavy increase in the user device power consumption. It also does not contribute to any increase in localization accuracy under line-of-sight (LOS) conditions. In this paper, we propose and implement a novel low-power channel-aware dynamic frequency DL-TDOA ranging algorithm. It comprises NLOS probability predictor based on a convolutional neural network (CNN), a dynamic ranging frequency control module, and an IMU sensor-based ranging filter. Based on the conducted experiments, we show that the proposed algorithm achieves 50% higher accuracy in NLOS conditions while having 46% lower power consumption in LOS conditions compared to baseline methods from prior research.
- [1648] arXiv:2402.10516 (cross-list from q-bio.BM) [ pdf , ps , other ]
-
Title: Generative AI for Controllable Protein Sequence Design: A SurveyYiheng Zhu , Zitai Kong , Jialu Wu , Weize Liu , Yuqiang Han , Mingze Yin , Hongxia Xu , Chang-Yu Hsieh , Tingjun HouComments: 9 pagesSubjects: Biomolecules (q-bio.BM) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: The design of novel protein sequences with targeted functionalities underpins a central theme in protein engineering, impacting diverse fields such as drug discovery and enzymatic engineering. However, navigating this vast combinatorial search space remains a severe challenge due to time and financial constraints. This scenario is rapidly evolving as the transformative advancements in AI, particularly in the realm of generative models and optimization algorithms, have been propelling the protein design field towards an unprecedented revolution. In this survey, we systematically review recent advances in generative AI for controllable protein sequence design. To set the stage, we first outline the foundational tasks in protein sequence design in terms of the constraints involved and present key generative models and optimization algorithms. We then offer in-depth reviews of each design task and discuss the pertinent applications. Finally, we identify the unresolved challenges and highlight research opportunities that merit deeper exploration.
- [1649] arXiv:2402.10524 (cross-list from cs.HC) [ pdf , ps , html , other ]
-
Title: LLM Comparator: Visual Analytics for Side-by-Side Evaluation of Large Language ModelsMinsuk Kahng , Ian Tenney , Mahima Pushkarna , Michael Xieyang Liu , James Wexler , Emily Reif , Krystal Kallarackal , Minsuk Chang , Michael Terry , Lucas DixonSubjects: Human-Computer Interaction (cs.HC) ; Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG)
Abstract: Automatic side-by-side evaluation has emerged as a promising approach to evaluating the quality of responses from large language models (LLMs). However, analyzing the results from this evaluation approach raises scalability and interpretability challenges. In this paper, we present LLM Comparator, a novel visual analytics tool for interactively analyzing results from automatic side-by-side evaluation. The tool supports interactive workflows for users to understand when and why a model performs better or worse than a baseline model, and how the responses from two models are qualitatively different. We iteratively designed and developed the tool by closely working with researchers and engineers at a large technology company. This paper details the user challenges we identified, the design and development of the tool, and an observational study with participants who regularly evaluate their models.
- [1650] arXiv:2402.10528 (cross-list from cs.CL) [ pdf , ps , html , other ]
-
Title: Can We Verify Step by Step for Incorrect Answer Detection?Comments: 8 pages, 6 figuresSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI)
Abstract: Chain-of-Thought (CoT) prompting has marked a significant advancement in enhancing the reasoning capabilities of large language models (LLMs). Previous studies have developed various extensions of CoT, which focus primarily on enhancing end-task performance. In addition, there has been research on assessing the quality of reasoning chains in CoT. This raises an intriguing question: Is it possible to predict the accuracy of LLM outputs by scrutinizing the reasoning chains they generate? To answer this research question, we introduce a benchmark, R2PE, designed specifically to explore the relationship between reasoning chains and performance in various reasoning tasks spanning five different domains. This benchmark aims to measure the falsehood of the final output of LLMs based on the reasoning steps. To make full use of information in multiple reasoning chains, we propose the process discernibility score (PDS) framework that beats the answer-checking baseline by a large margin. Concretely, this resulted in an average of 5.1% increase in the F1 score across all 45 subsets within R2PE. We further demonstrate our PDS's efficacy in advancing open-domain QA accuracy. Data and code are available at this https URL .
- [1651] arXiv:2402.10532 (cross-list from cs.CL) [ pdf , ps , html , other ]
-
Title: Properties and Challenges of LLM-Generated ExplanationsSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI); Human-Computer Interaction (cs.HC); Machine Learning (cs.LG)
Abstract: The self-rationalising capabilities of large language models (LLMs) have been explored in restricted settings, using task/specific data sets. However, current LLMs do not (only) rely on specifically annotated data; nonetheless, they frequently explain their outputs. The properties of the generated explanations are influenced by the pre-training corpus and by the target data used for instruction fine-tuning. As the pre-training corpus includes a large amount of human-written explanations "in the wild", we hypothesise that LLMs adopt common properties of human explanations. By analysing the outputs for a multi-domain instruction fine-tuning data set, we find that generated explanations show selectivity and contain illustrative elements, but less frequently are subjective or misleading. We discuss reasons and consequences of the properties' presence or absence. In particular, we outline positive and negative implications depending on the goals and user groups of the self-rationalising system.
- [1652] arXiv:2402.10543 (cross-list from cs.CL) [ pdf , ps , html , other ]
-
Title: Strong hallucinations from negation and how to fix themSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI)
Abstract: Despite great performance on many tasks, language models (LMs) still struggle with reasoning, sometimes providing responses that cannot possibly be true because they stem from logical incoherence. We call such responses \textit{strong hallucinations} and prove that they follow from an LM's computation of its internal representations for logical operators and outputs from those representations. Focusing on negation, we provide a novel solution in which negation is treated not as another element of a latent representation, but as \textit{an operation over an LM's latent representations that constrains how they may evolve}. We show that our approach improves model performance in cloze prompting and natural language inference tasks with negation without requiring training on sparse negative data.
- [1653] arXiv:2402.10567 (cross-list from cs.CL) [ pdf , ps , html , other ]
-
Title: InSaAF: Incorporating Safety through Accuracy and Fairness | Are LLMs ready for the Indian Legal Domain?Yogesh Tripathi , Raghav Donakanti , Sahil Girhepuje , Ishan Kavathekar , Bhaskara Hanuma Vedula , Gokul S Krishnan , Shreya Goyal , Anmol Goel , Balaraman Ravindran , Ponnurangam KumaraguruSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI)
Abstract: Recent advancements in language technology and Artificial Intelligence have resulted in numerous Language Models being proposed to perform various tasks in the legal domain ranging from predicting judgments to generating summaries. Despite their immense potential, these models have been proven to learn and exhibit societal biases and make unfair predictions. In this study, we explore the ability of Large Language Models (LLMs) to perform legal tasks in the Indian landscape when social factors are involved. We present a novel metric, $\beta$-weighted $\textit{Legal Safety Score ($LSS_{\beta}$)}$, which encapsulates both the fairness and accuracy aspects of the LLM. We assess LLMs' safety by considering its performance in the $\textit{Binary Statutory Reasoning}$ task and its fairness exhibition with respect to various axes of disparities in the Indian society. Task performance and fairness scores of LLaMA and LLaMA--2 models indicate that the proposed $LSS_{\beta}$ metric can effectively determine the readiness of a model for safe usage in the legal sector. We also propose finetuning pipelines, utilising specialised legal datasets, as a potential method to mitigate bias and improve model safety. The finetuning procedures on LLaMA and LLaMA--2 models increase the $LSS_{\beta}$, improving their usability in the Indian legal domain. Our code is publicly released.
- [1654] arXiv:2402.10571 (cross-list from cs.CL) [ pdf , ps , html , other ]
-
Title: Direct Preference Optimization with an OffsetSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: Direct preference optimization (DPO) is a successful fine-tuning strategy for aligning large language models with human preferences without the need to train a reward model or employ reinforcement learning. DPO, as originally formulated, relies on binary preference data and fine-tunes a language model to increase the likelihood of a preferred response over a dispreferred response. However, not all preference pairs are equal: while in some cases the preferred response is only slightly better than the dispreferred response, there can be a stronger preference for one response when, for example, the other response includes harmful or toxic content. In this paper, we propose a generalization of DPO, termed DPO with an offset (ODPO), that does not treat every preference pair equally during fine-tuning. Intuitively, ODPO requires the difference between the likelihood of the preferred and dispreferred response to be greater than an offset value. The offset is determined based on the extent to which one response is preferred over another. Our experiments on various tasks suggest that ODPO significantly outperforms DPO in aligning language models, especially when the number of preference pairs is limited.
- [1655] arXiv:2402.10575 (cross-list from cs.LG) [ pdf , ps , html , other ]
-
Title: Symbolic Autoencoding for Self-Supervised Sequence LearningMohammad Hossein Amani , Nicolas Mario Baldwin , Amin Mansouri , Martin Josifoski , Maxime Peyrard , Robert WestSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Abstract: Traditional language models, adept at next-token prediction in text sequences, often struggle with transduction tasks between distinct symbolic systems, particularly when parallel data is scarce. Addressing this issue, we introduce \textit{symbolic autoencoding} ($\Sigma$AE), a self-supervised framework that harnesses the power of abundant unparallel data alongside limited parallel data. $\Sigma$AE connects two generative models via a discrete bottleneck layer and is optimized end-to-end by minimizing reconstruction loss (simultaneously with supervised loss for the parallel data), such that the sequence generated by the discrete bottleneck can be read out as the transduced input sequence. We also develop gradient-based methods allowing for efficient self-supervised sequence learning despite the discreteness of the bottleneck. Our results demonstrate that $\Sigma$AE significantly enhances performance on transduction tasks, even with minimal parallel data, offering a promising solution for weakly supervised learning scenarios.
- [1656] arXiv:2402.10580 (cross-list from cs.CV) [ pdf , ps , other ]
-
Title: Efficient Multi-task Uncertainties for Joint Semantic Segmentation and Monocular Depth EstimationComments: 17 pages, 5 figures, 10 tables, submitted to peer-reviewed journalSubjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: Quantifying the predictive uncertainty emerged as a possible solution to common challenges like overconfidence or lack of explainability and robustness of deep neural networks, albeit one that is often computationally expensive. Many real-world applications are multi-modal in nature and hence benefit from multi-task learning. In autonomous driving, for example, the joint solution of semantic segmentation and monocular depth estimation has proven to be valuable. In this work, we first combine different uncertainty quantification methods with joint semantic segmentation and monocular depth estimation and evaluate how they perform in comparison to each other. Additionally, we reveal the benefits of multi-task learning with regard to the uncertainty quality compared to solving both tasks separately. Based on these insights, we introduce EMUFormer, a novel student-teacher distillation approach for joint semantic segmentation and monocular depth estimation as well as efficient multi-task uncertainty quantification. By implicitly leveraging the predictive uncertainties of the teacher, EMUFormer achieves new state-of-the-art results on Cityscapes and NYUv2 and additionally estimates high-quality predictive uncertainties for both tasks that are comparable or superior to a Deep Ensemble despite being an order of magnitude more efficient.
- [1657] arXiv:2402.10586 (cross-list from cs.CL) [ pdf , ps , html , other ]
-
Title: Threads of Subtlety: Detecting Machine-Generated Texts Through Discourse MotifsComments: 25 pagesSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI)
Abstract: With the advent of large language models (LLM), the line between human-crafted and machine-generated texts has become increasingly blurred. This paper delves into the inquiry of identifying discernible and unique linguistic properties in texts that were written by humans, particularly uncovering the underlying discourse structures of texts beyond their surface structures. Introducing a novel methodology, we leverage hierarchical parse trees and recursive hypergraphs to unveil distinctive discourse patterns in texts produced by both LLMs and humans. Empirical findings demonstrate that, although both LLMs and humans generate distinct discourse patterns influenced by specific domains, human-written texts exhibit more structural variability, reflecting the nuanced nature of human writing in different domains. Notably, incorporating hierarchical discourse features enhances binary classifiers' overall performance in distinguishing between human-written and machine-generated texts, even on out-of-distribution and paraphrased samples. This underscores the significance of incorporating hierarchical discourse features in the analysis of text patterns. The code and dataset will be available at [TBA].
- [1658] arXiv:2402.10597 (cross-list from cs.CL) [ pdf , ps , other ]
-
Title: Efficiency at Scale: Investigating the Performance of Diminutive Language Models in Clinical TasksNiall Taylor , Upamanyu Ghose , Omid Rohanian , Mohammadmahdi Nouriborji , Andrey Kormilitzin , David Clifton , Alejo Nevado-HolgadoSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI)
Abstract: The entry of large language models (LLMs) into research and commercial spaces has led to a trend of ever-larger models, with initial promises of generalisability, followed by a widespread desire to downsize and create specialised models without the need for complete fine-tuning, using Parameter Efficient Fine-tuning (PEFT) methods. We present an investigation into the suitability of different PEFT methods to clinical decision-making tasks, across a range of model sizes, including extremely small models with as few as $25$ million parameters.
Our analysis shows that the performance of most PEFT approaches varies significantly from one task to another, with the exception of LoRA, which maintains relatively high performance across all model sizes and tasks, typically approaching or matching full fine-tuned performance. The effectiveness of PEFT methods in the clinical domain is evident, particularly for specialised models which can operate on low-cost, in-house computing infrastructure. The advantages of these models, in terms of speed and reduced training costs, dramatically outweighs any performance gain from large foundation LLMs. Furthermore, we highlight how domain-specific pre-training interacts with PEFT methods and model size, and discuss how these factors interplay to provide the best efficiency-performance trade-off. Full code available at: tbd. - [1659] arXiv:2402.10601 (cross-list from cs.CL) [ pdf , ps , other ]
-
Title: Jailbreaking Proprietary Large Language Models using Word Substitution CipherComments: 15 pagesSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI)
Abstract: Large Language Models (LLMs) are aligned to moral and ethical guidelines but remain susceptible to creative prompts called Jailbreak that can bypass the alignment process. However, most jailbreaking prompts contain harmful questions in the natural language (mainly English), which can be detected by the LLM themselves. In this paper, we present jailbreaking prompts encoded using cryptographic techniques. We first present a pilot study on the state-of-the-art LLM, GPT-4, in decoding several safe sentences that have been encrypted using various cryptographic techniques and find that a straightforward word substitution cipher can be decoded most effectively. Motivated by this result, we use this encoding technique for writing jailbreaking prompts. We present a mapping of unsafe words with safe words and ask the unsafe question using these mapped words. Experimental results show an attack success rate (up to 59.42%) of our proposed jailbreaking approach on state-of-the-art proprietary models including ChatGPT, GPT-4, and Gemini-Pro. Additionally, we discuss the over-defensiveness of these models. We believe that our work will encourage further research in making these LLMs more robust while maintaining their decoding capabilities.
- [1660] arXiv:2402.10614 (cross-list from cs.CL) [ pdf , ps , other ]
-
Title: Can LLMs Speak For Diverse People? Tuning LLMs via Debate to Generate Controllable Controversial StatementsSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: Making LLMs speak for different, especially minority groups of people, and generate statements supporting their diverse or even controversial perspectives is critical to creating an inclusive environment. However, existing LLMs lack sufficient controllability to the stance of their generated content, which often contains inconsistent, neutral, or biased statements. In this paper, we improve the controllability of LLMs in generating statements supporting an argument the user defined in the prompt. We find that multi-round debates between two LLMs with opposite stances generate higher-quality and more salient statements for each, which are important training data to improve the controllability of LLMs. Motivated by this, we develop a novel debate & tuning ("DEBATunE") pipeline finetuning LLMs to generate the statements obtained via debate. To examine DEBATunE, we curate the largest dataset of debate topics so far, which covers 710 controversial topics and corresponding arguments for each topic. Evaluations by the GPT-4 judge with a novel controversy controllability metric show that LLMs' capability of expressing diverse perspectives is significantly improved by DEBATunE. Moreover, such controllability can be generalized to unseen topics, generating high-quality statements supporting controversial arguments. Our codes, models, and data will be released at this https URL .
- [1661] arXiv:2402.10617 (cross-list from cs.LG) [ pdf , ps , other ]
-
Title: Multitask Kernel-based Learning with Logic ConstraintsComments: The 19th European Conference on Artificial Intelligence (ECAI 2010)Journal-ref: Proceedings of the 19th European Conference on Artificial Intelligence (ECAI 2010)Subjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Abstract: This paper presents a general framework to integrate prior knowledge in the form of logic constraints among a set of task functions into kernel machines. The logic propositions provide a partial representation of the environment, in which the learner operates, that is exploited by the learning algorithm together with the information available in the supervised examples. In particular, we consider a multi-task learning scheme, where multiple unary predicates on the feature space are to be learned by kernel machines and a higher level abstract representation consists of logic clauses on these predicates, known to hold for any input. A general approach is presented to convert the logic clauses into a continuous implementation, that processes the outputs computed by the kernel-based predicates. The learning task is formulated as a primal optimization problem of a loss function that combines a term measuring the fitting of the supervised examples, a regularization term, and a penalty term that enforces the constraints on both supervised and unsupervised examples. The proposed semi-supervised learning framework is particularly suited for learning in high dimensionality feature spaces, where the supervised training examples tend to be sparse and generalization difficult. Unlike for standard kernel machines, the cost function to optimize is not generally guaranteed to be convex. However, the experimental results show that it is still possible to find good solutions using a two stage learning schema, in which first the supervised examples are learned until convergence and then the logic constraints are forced. Some promising experimental results on artificial multi-task learning tasks are reported, showing how the classification accuracy can be effectively improved by exploiting the a priori rules and the unsupervised examples.
- [1662] arXiv:2402.10634 (cross-list from cs.LG) [ pdf , ps , other ]
-
Title: Graph-based Forecasting with Missing Data through Spatiotemporal DownsamplingSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Abstract: Given a set of synchronous time series, each associated with a sensor-point in space and characterized by inter-series relationships, the problem of spatiotemporal forecasting consists of predicting future observations for each point. Spatiotemporal graph neural networks achieve striking results by representing the relationships across time series as a graph. Nonetheless, most existing methods rely on the often unrealistic assumption that inputs are always available and fail to capture hidden spatiotemporal dynamics when part of the data is missing. In this work, we tackle this problem through hierarchical spatiotemporal downsampling. The input time series are progressively coarsened over time and space, obtaining a pool of representations that capture heterogeneous temporal and spatial dynamics. Conditioned on observations and missing data patterns, such representations are combined by an interpretable attention mechanism to generate the forecasts. Our approach outperforms state-of-the-art methods on synthetic and real-world benchmarks under different missing data distributions, particularly in the presence of contiguous blocks of missing values.
- [1663] arXiv:2402.10635 (cross-list from cs.LG) [ pdf , ps , other ]
-
Title: ContiFormer: Continuous-Time Transformer for Irregular Time Series ModelingComments: Neurips 2023 PosterSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Abstract: Modeling continuous-time dynamics on irregular time series is critical to account for data evolution and correlations that occur continuously. Traditional methods including recurrent neural networks or Transformer models leverage inductive bias via powerful neural architectures to capture complex patterns. However, due to their discrete characteristic, they have limitations in generalizing to continuous-time data paradigms. Though neural ordinary differential equations (Neural ODEs) and their variants have shown promising results in dealing with irregular time series, they often fail to capture the intricate correlations within these sequences. It is challenging yet demanding to concurrently model the relationship between input data points and capture the dynamic changes of the continuous-time system. To tackle this problem, we propose ContiFormer that extends the relation modeling of vanilla Transformer to the continuous-time domain, which explicitly incorporates the modeling abilities of continuous dynamics of Neural ODEs with the attention mechanism of Transformers. We mathematically characterize the expressive power of ContiFormer and illustrate that, by curated designs of function hypothesis, many Transformer variants specialized in irregular time series modeling can be covered as a special case of ContiFormer. A wide range of experiments on both synthetic and real-world datasets have illustrated the superior modeling capacities and prediction performance of ContiFormer on irregular time series data. The project link is this https URL .
- [1664] arXiv:2402.10642 (cross-list from eess.AS) [ pdf , ps , other ]
-
Title: Speaking in Wavelet Domain: A Simple and Efficient Approach to Speed up Speech Diffusion ModelXiangyu Zhang , Daijiao Liu , Hexin Liu , Qiquan Zhang , Hanyu Meng , Leibny Paola Garcia , Eng Siong Chng , Lina YaoSubjects: Audio and Speech Processing (eess.AS) ; Artificial Intelligence (cs.AI)
Abstract: Recently, Denoising Diffusion Probabilistic Models (DDPMs) have attained leading performances across a diverse range of generative tasks. However, in the field of speech synthesis, although DDPMs exhibit impressive performance, their long training duration and substantial inference costs hinder practical deployment. Existing approaches primarily focus on enhancing inference speed, while approaches to accelerate training a key factor in the costs associated with adding or customizing voices often necessitate complex modifications to the model, compromising their universal applicability. To address the aforementioned challenges, we propose an inquiry: is it possible to enhance the training/inference speed and performance of DDPMs by modifying the speech signal itself? In this paper, we double the training and inference speed of Speech DDPMs by simply redirecting the generative target to the wavelet domain. This method not only achieves comparable or superior performance to the original model in speech synthesis tasks but also demonstrates its versatility. By investigating and utilizing different wavelet bases, our approach proves effective not just in speech synthesis, but also in speech enhancement.
- [1665] arXiv:2402.10643 (cross-list from cs.CL) [ pdf , ps , html , other ]
-
Title: `Keep it Together': Enforcing Cohesion in Extractive Summaries by Simulating Human MemorySubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI)
Abstract: Extractive summaries are usually presented as lists of sentences with no expected cohesion between them. In this paper, we aim to enforce cohesion whilst controlling for informativeness and redundancy in summaries, in cases where the input exhibits high redundancy. The pipeline controls for redundancy in long inputs as it is consumed, and balances informativeness and cohesion during sentence selection. Our sentence selector simulates human memory to keep track of topics --modeled as lexical chains--, enforcing cohesive ties between noun phrases. Across a variety of domains, our experiments revealed that it is possible to extract highly cohesive summaries that nevertheless read as informative to humans as summaries extracted by only accounting for informativeness or redundancy. The extracted summaries exhibit smooth topic transitions between sentences as signaled by lexical chains, with chains spanning adjacent or near-adjacent sentences.
- [1666] arXiv:2402.10645 (cross-list from cs.CL) [ pdf , ps , other ]
-
Title: Can Separators Improve Chain-of-Thought Prompting?Subjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI)
Abstract: Chain-of-thought (CoT) prompting is a simple and effective method for improving the reasoning capabilities of Large language models (LLMs). The basic idea of CoT is to let LLMs break down their thought processes step-by-step by putting exemplars in the input prompt. However, the densely structured prompt exemplars of CoT may cause the cognitive overload of LLMs. Inspired by human cognition, we introduce CoT-Sep, a novel method that strategically employs separators at the end of each exemplar in CoT prompting. These separators are designed to help the LLMs understand their thought processes better while reasoning. It turns out that CoT-Sep significantly improves the LLMs' performances on complex reasoning tasks (e.g., GSM-8K, AQuA, CSQA), compared with the vanilla CoT, which does not use separators. We also study the effects of the type and the location of separators tested on multiple LLMs, including GPT-3.5-Turbo, GPT-4, and LLaMA-2 7B. Interestingly, the type/location of separators should be chosen appropriately to boost the reasoning capability of CoT.
- [1667] arXiv:2402.10649 (cross-list from math.NA) [ pdf , ps , other ]
-
Title: Hermite Neural Network Simulation for Solving the 2D Schrodinger EquationSubjects: Numerical Analysis (math.NA) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Neural and Evolutionary Computing (cs.NE); Analysis of PDEs (math.AP)
Abstract: The Schrodinger equation is a mathematical equation describing the wave function's behavior in a quantum-mechanical system. It is a partial differential equation that provides valuable insights into the fundamental principles of quantum mechanics. In this paper, the aim was to solve the Schrodinger equation with sufficient accuracy by using a mixture of neural networks with the collocation method base Hermite functions. Initially, the Hermite functions roots were employed as collocation points, enhancing the efficiency of the solution. The Schrodinger equation is defined in an infinite domain, the use of Hermite functions as activation functions resulted in excellent precision. Finally, the proposed method was simulated using MATLAB's Simulink tool. The results were then compared with those obtained using Physics-informed neural networks and the presented method.
- [1668] arXiv:2402.10659 (cross-list from cs.SI) [ pdf , ps , html , other ]
-
Title: Network Formation and Dynamics Among Multi-LLMsSubjects: Social and Information Networks (cs.SI) ; Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Multiagent Systems (cs.MA)
Abstract: Social networks shape opinions, behaviors, and information dissemination in human societies. As large language models (LLMs) increasingly integrate into social and professional environments, understanding their behavior within the context of social interactions and networks becomes essential. Our study analyzes LLMs' network formation behavior to examine whether the dynamics of multiple LLMs are similar to or different from human social dynamics. We observe that LLMs exhibit key social network principles, including preferential attachment, triadic closure, homophily, community structure, and the small-world phenomenon, when asked about their preferences in network formation. We also investigate LLMs' decision-making based on real-world networks, revealing that triadic closure and homophily have a stronger influence than preferential attachment and that LLMs perform well in network formation predictions. Overall, our study opens up new possibilities for using LLMs in network science research and helps develop socially aware LLMs by shedding light on their network formation behaviors and exploring their impacts on social dynamics.
- [1669] arXiv:2402.10681 (cross-list from cs.LG) [ pdf , ps , other ]
-
Title: Physics-informed MeshGraphNets (PI-MGNs): Neural finite element solvers for non-stationary and nonlinear simulations on arbitrary meshesComments: Submitted to CMAMESubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Computational Engineering, Finance, and Science (cs.CE)
Abstract: Engineering components must meet increasing technological demands in ever shorter development cycles. To face these challenges, a holistic approach is essential that allows for the concurrent development of part design, material system and manufacturing process. Current approaches employ numerical simulations, which however quickly becomes computation-intensive, especially for iterative optimization. Data-driven machine learning methods can be used to replace time- and resource-intensive numerical simulations. In particular, MeshGraphNets (MGNs) have shown promising results. They enable fast and accurate predictions on unseen mesh geometries while being fully differentiable for optimization. However, these models rely on large amounts of expensive training data, such as numerical simulations. Physics-informed neural networks (PINNs) offer an opportunity to train neural networks with partial differential equations instead of labeled data, but have not been extended yet to handle time-dependent simulations of arbitrary meshes. This work introduces PI-MGNs, a hybrid approach that combines PINNs and MGNs to quickly and accurately solve non-stationary and nonlinear partial differential equations (PDEs) on arbitrary meshes. The method is exemplified for thermal process simulations of unseen parts with inhomogeneous material distribution. Further results show that the model scales well to large and complex meshes, although it is trained on small generic meshes only.
- [1670] arXiv:2402.10685 (cross-list from cs.CL) [ pdf , ps , html , other ]
-
Title: LongHeads: Multi-Head Attention is Secretly a Long Context ProcessorSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI)
Abstract: Large language models (LLMs) have achieved impressive performance in numerous domains but often struggle to process lengthy inputs effectively and efficiently due to limited length generalization and attention's quadratic computational demands. Many sought to mitigate this by restricting the attention window within the pre-trained length. However, these methods introduce new issues such as ignoring the middle context and requiring additional training. To address these problems, we propose LongHeads, a training-free framework that enhances LLM's long context ability by unlocking multi-head attention's untapped potential. Instead of allowing each head to attend to the full sentence, which struggles with generalizing to longer sequences due to out-of-distribution (OOD) issues, we allow each head to process in-distribution length by selecting and attending to important context chunks. To this end, we propose a chunk selection strategy that relies on the inherent correlation between the query and the key representations, efficiently distributing context chunks to different heads. In this way, each head ensures it can effectively process attended tokens within the trained length, while different heads in different layers can collectively process longer contexts. LongHeads works efficiently in linear time, fits seamlessly with many LLMs that use relative positional encoding. LongHeads achieves 100% accuracy at the 128k length on passkey retrieval task, verifying LongHeads's efficacy in extending the usable context window for existing models. We release our code at this https URL .
- [1671] arXiv:2402.10695 (cross-list from cs.LG) [ pdf , ps , html , other ]
-
Title: Unlink to Unlearn: Simplifying Edge Unlearning in GNNsComments: Accepted by WWW 2024 as a Short Research PaperSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Cryptography and Security (cs.CR)
Abstract: As concerns over data privacy intensify, unlearning in Graph Neural Networks (GNNs) has emerged as a prominent research frontier in academia. This concept is pivotal in enforcing the \textit{right to be forgotten}, which entails the selective removal of specific data from trained GNNs upon user request. Our research focuses on edge unlearning, a process of particular relevance to real-world applications. Current state-of-the-art approaches like GNNDelete can eliminate the influence of specific edges yet suffer from \textit{over-forgetting}, which means the unlearning process inadvertently removes excessive information beyond needed, leading to a significant performance decline for remaining edges. Our analysis identifies the loss functions of GNNDelete as the primary source of over-forgetting and also suggests that loss functions may be redundant for effective edge unlearning. Building on these insights, we simplify GNNDelete to develop \textbf{Unlink to Unlearn} (UtU), a novel method that facilitates unlearning exclusively through unlinking the forget edges from graph structure. Our extensive experiments demonstrate that UtU delivers privacy protection on par with that of a retrained model while preserving high accuracy in downstream tasks, by upholding over 97.3\% of the retrained model's privacy protection capabilities and 99.8\% of its link prediction accuracy. Meanwhile, UtU requires only constant computational demands, underscoring its advantage as a highly lightweight and practical edge unlearning solution.
- [1672] arXiv:2402.10701 (cross-list from cs.NI) [ pdf , ps , other ]
-
Title: Does Twinning Vehicular Networks Enhance Their Performance in Dense Areas?Comments: 6 pages, 8 figures, 2tables, conference paperSubjects: Networking and Internet Architecture (cs.NI) ; Artificial Intelligence (cs.AI)
Abstract: This paper investigates the potential of Digital Twins (DTs) to enhance network performance in densely populated urban areas, specifically focusing on vehicular networks. The study comprises two phases. In Phase I, we utilize traffic data and AI clustering to identify critical locations, particularly in crowded urban areas with high accident rates. In Phase II, we evaluate the advantages of twinning vehicular networks through three deployment scenarios: edge-based twin, cloud-based twin, and hybrid-based twin. Our analysis demonstrates that twinning significantly reduces network delays, with virtual twins outperforming physical networks. Virtual twins maintain low delays even with increased vehicle density, such as 15.05 seconds for 300 vehicles. Moreover, they exhibit faster computational speeds, with cloud-based twins being 1.7 times faster than edge twins in certain scenarios. These findings provide insights for efficient vehicular communication and underscore the potential of virtual twins in enhancing vehicular networks in crowded areas while emphasizing the importance of considering real-world factors when making deployment decisions.
- [1673] arXiv:2402.10712 (cross-list from cs.CL) [ pdf , ps , other ]
-
Title: An Empirical Study on Cross-lingual Vocabulary Adaptation for Efficient Generative LLM InferenceSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI)
Abstract: The development of state-of-the-art generative large language models (LLMs) disproportionately relies on English-centric tokenizers, vocabulary and pre-training data. Despite the fact that some LLMs have multilingual capabilities, recent studies have shown that their inference efficiency deteriorates when generating text in languages other than English. This results in increased inference time and costs. Cross-lingual vocabulary adaptation methods have been proposed for adapting models to a target language aiming to improve downstream performance. However, the effectiveness of these methods on increasing inference efficiency of generative LLMs has yet to be explored. In this paper, we perform an empirical study of various cross-lingual vocabulary adaptation methods on five generative LLMs (including monolingual and multilingual models) across four typologically-diverse languages and four natural language understanding tasks. We find that cross-lingual vocabulary adaptation substantially contributes to LLM inference speedups of up to 271.5%. We also show that adapting LLMs that have been pre-trained on more balanced multilingual data results in downstream performance comparable to the original models.
- [1674] arXiv:2402.10717 (cross-list from cs.CV) [ pdf , ps , other ]
-
Title: BioFusionNet: Deep Learning-Based Survival Risk Stratification in ER+ Breast Cancer Through Multifeature and Multimodal Data FusionComments: Keywords: Multimodal Fusion, Breast Cancer, Whole Slide Images, Deep Neural Network, Survival PredictionSubjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI)
Abstract: Breast cancer is a significant health concern affecting millions of women worldwide. Accurate survival risk stratification plays a crucial role in guiding personalised treatment decisions and improving patient outcomes. Here we present BioFusionNet, a deep learning framework that fuses image-derived features with genetic and clinical data to achieve a holistic patient profile and perform survival risk stratification of ER+ breast cancer patients. We employ multiple self-supervised feature extractors, namely DINO and MoCoV3, pretrained on histopathology patches to capture detailed histopathological image features. We then utilise a variational autoencoder (VAE) to fuse these features, and harness the latent space of the VAE to feed into a self-attention network, generating patient-level features. Next, we develop a co-dual-cross-attention mechanism to combine the histopathological features with genetic data, enabling the model to capture the interplay between them. Additionally, clinical data is incorporated using a feed-forward network (FFN), further enhancing predictive performance and achieving comprehensive multimodal feature integration. Furthermore, we introduce a weighted Cox loss function, specifically designed to handle imbalanced survival data, which is a common challenge in the field. The proposed model achieves a mean concordance index (C-index) of 0.77 and a time-dependent area under the curve (AUC) of 0.84, outperforming state-of-the-art methods. It predicts risk (high versus low) with prognostic significance for overall survival (OS) in univariate analysis (HR=2.99, 95% CI: 1.88--4.78, p<0.005), and maintains independent significance in multivariate analysis incorporating standard clinicopathological variables (HR=2.91, 95% CI: 1.80--4.68, p<0.005). The proposed method not only improves model performance but also addresses a critical gap in handling imbalanced data.
- [1675] arXiv:2402.10744 (cross-list from cs.CL) [ pdf , ps , other ]
-
Title: GenRES: Rethinking Evaluation for Generative Relation Extraction in the Era of Large Language ModelsSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI)
Abstract: The field of relation extraction (RE) is experiencing a notable shift towards generative relation extraction (GRE), leveraging the capabilities of large language models (LLMs). However, we discovered that traditional relation extraction (RE) metrics like precision and recall fall short in evaluating GRE methods. This shortfall arises because these metrics rely on exact matching with human-annotated reference relations, while GRE methods often produce diverse and semantically accurate relations that differ from the references. To fill this gap, we introduce GenRES for a multi-dimensional assessment in terms of the topic similarity, uniqueness, granularity, factualness, and completeness of the GRE results. With GenRES, we empirically identified that (1) precision/recall fails to justify the performance of GRE methods; (2) human-annotated referential relations can be incomplete; (3) prompting LLMs with a fixed set of relations or entities can cause hallucinations. Next, we conducted a human evaluation of GRE methods that shows GenRES is consistent with human preferences for RE quality. Last, we made a comprehensive evaluation of fourteen leading LLMs using GenRES across document, bag, and sentence level RE datasets, respectively, to set the benchmark for future research in GRE
- [1676] arXiv:2402.10747 (cross-list from cs.LG) [ pdf , ps , other ]
-
Title: Fully Differentiable Lagrangian Convolutional Neural Network for Continuity-Consistent Physics-Informed Precipitation NowcastingComments: Submitted to ICML 2024Subjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
Abstract: This paper presents a convolutional neural network model for precipitation nowcasting that combines data-driven learning with physics-informed domain knowledge. We propose LUPIN, a Lagrangian Double U-Net for Physics-Informed Nowcasting, that draws from existing extrapolation-based nowcasting methods and implements the Lagrangian coordinate system transformation of the data in a fully differentiable and GPU-accelerated manner to allow for real-time end-to-end training and inference. Based on our evaluation, LUPIN matches and exceeds the performance of the chosen benchmark, opening the door for other Lagrangian machine learning models.
- [1677] arXiv:2402.10753 (cross-list from cs.CL) [ pdf , ps , other ]
-
Title: ToolSword: Unveiling Safety Issues of Large Language Models in Tool Learning Across Three StagesJunjie Ye , Sixian Li , Guanyu Li , Caishuang Huang , Songyang Gao , Yilong Wu , Qi Zhang , Tao Gui , Xuanjing HuangSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI)
Abstract: Tool learning is widely acknowledged as a foundational approach or deploying large language models (LLMs) in real-world scenarios. While current research primarily emphasizes leveraging tools to augment LLMs, it frequently neglects emerging safety considerations tied to their application. To fill this gap, we present $ToolSword$, a comprehensive framework dedicated to meticulously investigating safety issues linked to LLMs in tool learning. Specifically, ToolSword delineates six safety scenarios for LLMs in tool learning, encompassing $malicious$ $queries$ and $jailbreak$ $attacks$ in the input stage, $noisy$ $misdirection$ and $risky$ $cues$ in the execution stage, and $harmful$ $feedback$ and $error$ $conflicts$ in the output stage. Experiments conducted on 11 open-source and closed-source LLMs reveal enduring safety challenges in tool learning, such as handling harmful queries, employing risky tools, and delivering detrimental feedback, which even GPT-4 is susceptible to. Moreover, we conduct further studies with the aim of fostering research on tool learning safety. The data is released in this https URL .
- [1678] arXiv:2402.10756 (cross-list from cs.LG) [ pdf , ps , other ]
-
Title: Towards Cohesion-Fairness Harmony: Contrastive Regularization in Individual Fair Graph ClusteringComments: To be published in "The 28th Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD 2024)"Subjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Information Theory (cs.IT); Social and Information Networks (cs.SI)
Abstract: Conventional fair graph clustering methods face two primary challenges: i) They prioritize balanced clusters at the expense of cluster cohesion by imposing rigid constraints, ii) Existing methods of both individual and group-level fairness in graph partitioning mostly rely on eigen decompositions and thus, generally lack interpretability. To address these issues, we propose iFairNMTF, an individual Fairness Nonnegative Matrix Tri-Factorization model with contrastive fairness regularization that achieves balanced and cohesive clusters. By introducing fairness regularization, our model allows for customizable accuracy-fairness trade-offs, thereby enhancing user autonomy without compromising the interpretability provided by nonnegative matrix tri-factorization. Experimental evaluations on real and synthetic datasets demonstrate the superior flexibility of iFairNMTF in achieving fairness and clustering performance.
- [1679] arXiv:2402.10765 (cross-list from cs.LG) [ pdf , ps , other ]
-
Title: Policy Learning for Off-Dynamics RL with Deficient SupportComments: Accepted by AAMAS 2024 as a full paperSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Abstract: Reinforcement Learning (RL) can effectively learn complex policies. However, learning these policies often demands extensive trial-and-error interactions with the environment. In many real-world scenarios, this approach is not practical due to the high costs of data collection and safety concerns. As a result, a common strategy is to transfer a policy trained in a low-cost, rapid source simulator to a real-world target environment. However, this process poses challenges. Simulators, no matter how advanced, cannot perfectly replicate the intricacies of the real world, leading to dynamics discrepancies between the source and target environments. Past research posited that the source domain must encompass all possible target transitions, a condition we term full support. However, expecting full support is often unrealistic, especially in scenarios where significant dynamics discrepancies arise. In this paper, our emphasis shifts to addressing large dynamics mismatch adaptation. We move away from the stringent full support condition of earlier research, focusing instead on crafting an effective policy for the target domain. Our proposed approach is simple but effective. It is anchored in the central concepts of the skewing and extension of source support towards target support to mitigate support deficiencies. Through comprehensive testing on a varied set of benchmarks, our method's efficacy stands out, showcasing notable improvements over previous techniques.
- [1680] arXiv:2402.10767 (cross-list from cs.CL) [ pdf , ps , other ]
-
Title: Inference to the Best Explanation in Large Language ModelsSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI)
Abstract: While Large Language Models (LLMs) have found success in real-world applications, their underlying explanatory process is still poorly understood. This paper proposes IBE-Eval, a framework inspired by philosophical accounts on Inference to the Best Explanation (IBE) to advance the interpretation and evaluation of LLMs' explanations. IBE-Eval estimates the plausibility of natural language explanations through a combination of explicit logical and linguistic features including: consistency, parsimony, coherence, and uncertainty. Extensive experiments are conducted on Causal Question Answering (CQA), where \textit{IBE-Eval} is tasked to select the most plausible causal explanation amongst competing ones generated by LLMs (i.e., GPT 3.5 and Llama 2). The experiments reveal that IBE-Eval can successfully identify the best explanation with up to 77\% accuracy ($\approx 27\%$ above random), improving upon a GPT 3.5-as-a-Judge baseline ($\approx+17\%$) while being intrinsically more efficient and interpretable. Additional analyses suggest that, despite model-specific variances, LLM-generated explanations tend to conform to IBE criteria and that IBE-Eval is significantly correlated with human judgment, opening up opportunities for future development of automated explanation verification tools.
- [1681] arXiv:2402.10769 (cross-list from cs.CL) [ pdf , ps , other ]
-
Title: Distillation Enhanced Generative RetrievalSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI); Information Retrieval (cs.IR)
Abstract: Generative retrieval is a promising new paradigm in text retrieval that generates identifier strings of relevant passages as the retrieval target. This paradigm leverages powerful generative language models, distinct from traditional sparse or dense retrieval methods. In this work, we identify a viable direction to further enhance generative retrieval via distillation and propose a feasible framework, named DGR. DGR utilizes sophisticated ranking models, such as the cross-encoder, in a teacher role to supply a passage rank list, which captures the varying relevance degrees of passages instead of binary hard labels; subsequently, DGR employs a specially designed distilled RankNet loss to optimize the generative retrieval model, considering the passage rank order provided by the teacher model as labels. This framework only requires an additional distillation step to enhance current generative retrieval systems and does not add any burden to the inference stage. We conduct experiments on four public datasets, and the results indicate that DGR achieves state-of-the-art performance among the generative retrieval methods. Additionally, DGR demonstrates exceptional robustness and generalizability with various teacher models and distillation losses.
- [1682] arXiv:2402.10770 (cross-list from cs.CL) [ pdf , ps , other ]
-
Title: How Reliable Are Automatic Evaluation Methods for Instruction-Tuned LLMs?Subjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI)
Abstract: Work on instruction-tuned Large Language Models (LLMs) has used automatic methods based on text overlap and LLM judgments as cost-effective alternatives to human evaluation. In this paper, we study the reliability of such methods across a broad range of tasks and in a cross-lingual setting. In contrast to previous findings, we observe considerable variability in correlations between automatic methods and human evaluators when scores are differentiated by task type. Specifically, the widely-used ROUGE-L metric strongly correlates with human judgments for short-answer English tasks but is unreliable in free-form generation tasks and cross-lingual transfer. The effectiveness of GPT-4 as an evaluator depends on including reference answers when prompting for assessments, which can lead to overly strict evaluations in free-form generation tasks. In summary, we find that, while automatic evaluation methods can approximate human judgements under specific conditions, their reliability is highly context-dependent. Our findings enhance the understanding of how automatic methods should be applied and interpreted when developing and evaluating instruction-tuned LLMs.
- [1683] arXiv:2402.10774 (cross-list from cs.LG) [ pdf , ps , other ]
-
Title: Error Feedback Reloaded: From Quadratic to Arithmetic Mean of Smoothness ConstantsComments: 70 pages, 14 figures, 6 tablesSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Optimization and Control (math.OC); Machine Learning (stat.ML)
Abstract: Error Feedback (EF) is a highly popular and immensely effective mechanism for fixing convergence issues which arise in distributed training methods (such as distributed GD or SGD) when these are enhanced with greedy communication compression techniques such as TopK. While EF was proposed almost a decade ago (Seide et al., 2014), and despite concentrated effort by the community to advance the theoretical understanding of this mechanism, there is still a lot to explore. In this work we study a modern form of error feedback called EF21 (Richtarik et al., 2021) which offers the currently best-known theoretical guarantees, under the weakest assumptions, and also works well in practice. In particular, while the theoretical communication complexity of EF21 depends on the quadratic mean of certain smoothness parameters, we improve this dependence to their arithmetic mean, which is always smaller, and can be substantially smaller, especially in heterogeneous data regimes. We take the reader on a journey of our discovery process. Starting with the idea of applying EF21 to an equivalent reformulation of the underlying problem which (unfortunately) requires (often impractical) machine cloning, we continue to the discovery of a new weighted version of EF21 which can (fortunately) be executed without any cloning, and finally circle back to an improved analysis of the original EF21 method. While this development applies to the simplest form of EF21, our approach naturally extends to more elaborate variants involving stochastic gradients and partial participation. Further, our technique improves the best-known theory of EF21 in the rare features regime (Richtarik et al., 2023). Finally, we validate our theoretical findings with suitable experiments.
- [1684] arXiv:2402.10778 (cross-list from cs.RO) [ pdf , ps , html , other ]
-
Title: AutoGPT+P: Affordance-based Task Planning with Large Language ModelsComments: 12 pages, 16 pages including references and appendix, 5 figuresSubjects: Robotics (cs.RO) ; Artificial Intelligence (cs.AI)
Abstract: Recent advances in task planning leverage Large Language Models (LLMs) to improve generalizability by combining such models with classical planning algorithms to address their inherent limitations in reasoning capabilities. However, these approaches face the challenge of dynamically capturing the initial state of the task planning problem. To alleviate this issue, we propose AutoGPT+P, a system that combines an affordance-based scene representation with a planning system. Affordances encompass the action possibilities of an agent on the environment and objects present in it. Thus, deriving the planning domain from an affordance-based scene representation allows symbolic planning with arbitrary objects. AutoGPT+P leverages this representation to derive and execute a plan for a task specified by the user in natural language. In addition to solving planning tasks under a closed-world assumption, AutoGPT+P can also handle planning with incomplete information, e. g., tasks with missing objects by exploring the scene, suggesting alternatives, or providing a partial plan. The affordance-based scene representation combines object detection with an automatically generated object-affordance-mapping using ChatGPT. The core planning tool extends existing work by automatically correcting semantic and syntactic errors. Our approach achieves a success rate of 98%, surpassing the current 81% success rate of the current state-of-the-art LLM-based planning method SayCan on the SayCan instruction set. Furthermore, we evaluated our approach on our newly created dataset with 150 scenarios covering a wide range of complex tasks with missing objects, achieving a success rate of 79% on our dataset. The dataset and the code are publicly available at this https URL .
- [1685] arXiv:2402.10787 (cross-list from cs.LG) [ pdf , ps , html , other ]
-
Title: EdgeQAT: Entropy and Distribution Guided Quantization-Aware Training for the Acceleration of Lightweight LLMs on the EdgeXuan Shen , Zhenglun Kong , Changdi Yang , Zhaoyang Han , Lei Lu , Peiyan Dong , Cheng Lyu , Chih-hsiang Li , Xuehang Guo , Zhihao Shu , Wei Niu , Miriam Leeser , Pu Zhao , Yanzhi WangComments: PreprintSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
Abstract: Despite the remarkable strides of Large Language Models (LLMs) in various fields, the wide applications of LLMs on edge devices are limited due to their massive parameters and computations. To address this, quantization is commonly adopted to generate lightweight LLMs with efficient computations and fast inference. However, Post-Training Quantization (PTQ) methods dramatically degrade in quality when quantizing weights, activations, and KV cache together to below 8 bits. Besides, many Quantization-Aware Training (QAT) works quantize model weights, leaving the activations untouched, which do not fully exploit the potential of quantization for inference acceleration on the edge. In this paper, we propose EdgeQAT, the Entropy and Distribution Guided QAT for the optimization of lightweight LLMs to achieve inference acceleration on Edge devices. We first identify that the performance drop of quantization primarily stems from the information distortion in quantized attention maps, demonstrated by the different distributions in quantized query and key of the self-attention mechanism. Then, the entropy and distribution guided QAT is proposed to mitigate the information distortion. Moreover, we design a token importance-aware adaptive method to dynamically quantize the tokens with different bit widths for further optimization and acceleration. Our extensive experiments verify the substantial improvements with our framework across various datasets. Furthermore, we achieve an on-device speedup of up to 2.37x compared with its FP16 counterparts across multiple edge devices, signaling a groundbreaking advancement.
- [1686] arXiv:2402.10790 (cross-list from cs.CL) [ pdf , ps , html , other ]
-
Title: In Search of Needles in a 11M Haystack: Recurrent Memory Finds What LLMs MissComments: 11M tokens, fix qa3 min facts per task in Table 1Subjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: This paper addresses the challenge of processing long documents using generative transformer models. To evaluate different approaches, we introduce BABILong, a new benchmark designed to assess model capabilities in extracting and processing distributed facts within extensive texts. Our evaluation, which includes benchmarks for GPT-4 and RAG, reveals that common methods are effective only for sequences up to $10^4$ elements. In contrast, fine-tuning GPT-2 with recurrent memory augmentations enables it to handle tasks involving up to $11\times 10^6$ elements. This achievement marks a substantial leap, as it is by far the longest input processed by any neural network model to date, demonstrating a significant improvement in the processing capabilities for long sequences.
- [1687] arXiv:2402.10793 (cross-list from cs.LG) [ pdf , ps , other ]
-
Title: Masked Attention is All You Need for GraphsSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Abstract: Graph neural networks (GNNs) and variations of the message passing algorithm are the predominant means for learning on graphs, largely due to their flexibility, speed, and satisfactory performance. The design of powerful and general purpose GNNs, however, requires significant research efforts and often relies on handcrafted, carefully-chosen message passing operators. Motivated by this, we propose a remarkably simple alternative for learning on graphs that relies exclusively on attention. Graphs are represented as node or edge sets and their connectivity is enforced by masking the attention weight matrix, effectively creating custom attention patterns for each graph. Despite its simplicity, masked attention for graphs (MAG) has state-of-the-art performance on long-range tasks and outperforms strong message passing baselines and much more involved attention-based methods on over 55 node and graph-level tasks. We also show significantly better transfer learning capabilities compared to GNNs and comparable or better time and memory scaling. MAG has sub-linear memory scaling in the number of nodes or edges, enabling learning on dense graphs and future-proofing the approach.
- [1688] arXiv:2402.10798 (cross-list from cs.CV) [ pdf , ps , html , other ]
-
Title: VATr++: Choose Your Words Wisely for Handwritten Text GenerationBram Vanherle , Vittorio Pippi , Silvia Cascianelli , Nick Michiels , Frank Van Reeth , Rita CucchiaraSubjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI)
Abstract: Styled Handwritten Text Generation (HTG) has received significant attention in recent years, propelled by the success of learning-based solutions employing GANs, Transformers, and, preliminarily, Diffusion Models. Despite this surge in interest, there remains a critical yet understudied aspect - the impact of the input, both visual and textual, on the HTG model training and its subsequent influence on performance. This study delves deeper into a cutting-edge Styled-HTG approach, proposing strategies for input preparation and training regularization that allow the model to achieve better performance and generalize better. These aspects are validated through extensive analysis on several different settings and datasets. Moreover, in this work, we go beyond performance optimization and address a significant hurdle in HTG research - the lack of a standardized evaluation protocol. In particular, we propose a standardization of the evaluation protocol for HTG and conduct a comprehensive benchmarking of existing approaches. By doing so, we aim to establish a foundation for fair and meaningful comparisons between HTG strategies, fostering progress in the field.
- [1689] arXiv:2402.10803 (cross-list from q-fin.CP) [ pdf , ps , html , other ]
-
Title: Modelling crypto markets by multi-agent reinforcement learningSubjects: Computational Finance (q-fin.CP) ; Artificial Intelligence (cs.AI); Computer Science and Game Theory (cs.GT); Multiagent Systems (cs.MA)
Abstract: Building on a previous foundation work (Lussange et al. 2020), this study introduces a multi-agent reinforcement learning (MARL) model simulating crypto markets, which is calibrated to the Binance's daily closing prices of $153$ cryptocurrencies that were continuously traded between 2018 and 2022. Unlike previous agent-based models (ABM) or multi-agent systems (MAS) which relied on zero-intelligence agents or single autonomous agent methodologies, our approach relies on endowing agents with reinforcement learning (RL) techniques in order to model crypto markets. This integration is designed to emulate, with a bottom-up approach to complexity inference, both individual and collective agents, ensuring robustness in the recent volatile conditions of such markets and during the COVID-19 era. A key feature of our model also lies in the fact that its autonomous agents perform asset price valuation based on two sources of information: the market prices themselves, and the approximation of the crypto assets fundamental values beyond what those market prices are. Our MAS calibration against real market data allows for an accurate emulation of crypto markets microstructure and probing key market behaviors, in both the bearish and bullish regimes of that particular time period.
- [1690] arXiv:2402.10805 (cross-list from cs.MM) [ pdf , ps , other ]
-
Title: Generative Cross-Modal Retrieval: Memorizing Images in Multimodal Language Models for Retrieval and BeyondSubjects: Multimedia (cs.MM) ; Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV); Information Retrieval (cs.IR)
Abstract: The recent advancements in generative language models have demonstrated their ability to memorize knowledge from documents and recall knowledge to respond to user queries effectively. Building upon this capability, we propose to enable multimodal large language models (MLLMs) to memorize and recall images within their parameters. Given a user query for visual content, the MLLM is anticipated to "recall" the relevant image from its parameters as the response. Achieving this target presents notable challenges, including inbuilt visual memory and visual recall schemes within MLLMs. To address these challenges, we introduce a generative cross-modal retrieval framework, which assigns unique identifier strings to represent images and involves two training steps: learning to memorize and learning to retrieve. The first step focuses on training the MLLM to memorize the association between images and their respective identifiers. The latter step teaches the MLLM to generate the corresponding identifier of the target image, given the textual query input. By memorizing images in MLLMs, we introduce a new paradigm to cross-modal retrieval, distinct from previous discriminative approaches. The experiments demonstrate that the generative paradigm performs effectively and efficiently even with large-scale image candidate sets.
- [1691] arXiv:2402.10828 (cross-list from cs.RO) [ pdf , ps , html , other ]
-
Title: RAG-Driver: Generalisable Driving Explanations with Retrieval-Augmented In-Context Learning in Multi-Modal Large Language ModelComments: 13 pages, 6 figuresSubjects: Robotics (cs.RO) ; Artificial Intelligence (cs.AI)
Abstract: Robots powered by 'blackbox' models need to provide human-understandable explanations which we can trust. Hence, explainability plays a critical role in trustworthy autonomous decision-making to foster transparency and acceptance among end users, especially in complex autonomous driving. Recent advancements in Multi-Modal Large Language models (MLLMs) have shown promising potential in enhancing the explainability as a driving agent by producing control predictions along with natural language explanations. However, severe data scarcity due to expensive annotation costs and significant domain gaps between different datasets makes the development of a robust and generalisable system an extremely challenging task. Moreover, the prohibitively expensive training requirements of MLLM and the unsolved problem of catastrophic forgetting further limit their generalisability post-deployment. To address these challenges, we present RAG-Driver, a novel retrieval-augmented multi-modal large language model that leverages in-context learning for high-performance, explainable, and generalisable autonomous driving. By grounding in retrieved expert demonstration, we empirically validate that RAG-Driver achieves state-of-the-art performance in producing driving action explanations, justifications, and control signal prediction. More importantly, it exhibits exceptional zero-shot generalisation capabilities to unseen environments without further training endeavours.
- [1692] arXiv:2402.10837 (cross-list from cs.RO) [ pdf , ps , html , other ]
-
Title: Pedipulate: Enabling Manipulation Skills using a Quadruped Robot's LegComments: Project website: this https URLSubjects: Robotics (cs.RO) ; Artificial Intelligence (cs.AI); Systems and Control (eess.SY)
Abstract: Legged robots have the potential to become vital in maintenance, home support, and exploration scenarios. In order to interact with and manipulate their environments, most legged robots are equipped with a dedicated robot arm, which means additional mass and mechanical complexity compared to standard legged robots. In this work, we explore pedipulation - using the legs of a legged robot for manipulation. By training a reinforcement learning policy that tracks position targets for one foot, we enable a dedicated pedipulation controller that is robust to disturbances, has a large workspace through whole-body behaviors, and can reach far-away targets with gait emergence, enabling loco-pedipulation. By deploying our controller on a quadrupedal robot using teleoperation, we demonstrate various real-world tasks such as door opening, sample collection, and pushing obstacles. We demonstrate load carrying of more than 2.0 kg at the foot. Additionally, the controller is robust to interaction forces at the foot, disturbances at the base, and slippery contact surfaces. Videos of the experiments are available at this https URL .
- [1693] arXiv:2402.10846 (cross-list from cs.LG) [ pdf , ps , html , other ]
-
Title: FedD2S: Personalized Data-Free Federated Knowledge DistillationKawa Atapour , S. Jamal Seyedmohammadi , Jamshid Abouei , Arash Mohammadi , Konstantinos N. PlataniotisSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Distributed, Parallel, and Cluster Computing (cs.DC); Image and Video Processing (eess.IV)
Abstract: This paper addresses the challenge of mitigating data heterogeneity among clients within a Federated Learning (FL) framework. The model-drift issue, arising from the noniid nature of client data, often results in suboptimal personalization of a global model compared to locally trained models for each client. To tackle this challenge, we propose a novel approach named FedD2S for Personalized Federated Learning (pFL), leveraging knowledge distillation. FedD2S incorporates a deep-to-shallow layer-dropping mechanism in the data-free knowledge distillation process to enhance local model personalization. Through extensive simulations on diverse image datasets-FEMNIST, CIFAR10, CINIC0, and CIFAR100-we compare FedD2S with state-of-the-art FL baselines. The proposed approach demonstrates superior performance, characterized by accelerated convergence and improved fairness among clients. The introduced layer-dropping technique effectively captures personalized knowledge, resulting in enhanced performance compared to alternative FL models. Moreover, we investigate the impact of key hyperparameters, such as the participation ratio and layer-dropping rate, providing valuable insights into the optimal configuration for FedD2S. The findings demonstrate the efficacy of adaptive layer-dropping in the knowledge distillation process to achieve enhanced personalization and performance across diverse datasets and tasks.
- [1694] arXiv:2402.10884 (cross-list from cs.CL) [ pdf , ps , html , other ]
-
Title: Multi-modal preference alignment remedies regression of visual instruction tuning on language modelSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Abstract: In production, multi-modal large language models (MLLMs) are expected to support multi-turn queries of interchanging image and text modalities. However, the current MLLMs trained with visual-question-answering (VQA) datasets could suffer from degradation, as VQA datasets lack the diversity and complexity of the original text instruction datasets which the underlying language model had been trained with. To address this challenging degradation, we first collect a lightweight (6k entries) VQA preference dataset where answers were annotated by Gemini for 5 quality metrics in a granular fashion, and investigate standard Supervised Fine-tuning, rejection sampling, Direct Preference Optimization (DPO), and SteerLM. Our findings indicate that the with DPO we are able to surpass instruction-following capabilities of the language model, achieving a 6.73 score on MT-Bench, compared to Vicuna's 6.57 and LLaVA's 5.99 despite small data scale. This enhancement in textual instruction proficiency correlates with boosted visual instruction performance (+4.9\% on MM-Vet, +6\% on LLaVA-Bench), with minimal alignment tax on visual knowledge benchmarks compared to previous RLHF approach. In conclusion, we propose a distillation-based multi-modal alignment model with fine-grained annotations on a small dataset that reconciles the textual and visual performance of MLLMs, restoring and boosting language capability after visual instruction tuning.
- [1695] arXiv:2402.10885 (cross-list from cs.RO) [ pdf , ps , html , other ]
-
Title: 3D Diffuser Actor: Policy Diffusion with 3D Scene RepresentationsComments: First two authors contributed equallySubjects: Robotics (cs.RO) ; Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Abstract: We marry diffusion policies and 3D scene representations for robot manipulation. Diffusion policies learn the action distribution conditioned on the robot and environment state using conditional diffusion models. They have recently shown to outperform both deterministic and alternative state-conditioned action distribution learning methods. 3D robot policies use 3D scene feature representations aggregated from a single or multiple camera views using sensed depth. They have shown to generalize better than their 2D counterparts across camera viewpoints. We unify these two lines of work and present 3D Diffuser Actor, a neural policy architecture that, given a language instruction, builds a 3D representation of the visual scene and conditions on it to iteratively denoise 3D rotations and translations for the robot's end-effector. At each denoising iteration, our model represents end-effector pose estimates as 3D scene tokens and predicts the 3D translation and rotation error for each of them, by featurizing them using 3D relative attention to other 3D visual and language tokens. 3D Diffuser Actor sets a new state-of-the-art on RLBench with an absolute performance gain of 16.3% over the current SOTA on a multi-view setup and an absolute gain of 13.1% on a single-view setup. On the CALVIN benchmark, it outperforms the current SOTA in the setting of zero-shot unseen scene generalization by being able to successfully run 0.2 more tasks, a 7% relative increase. It also works in the real world from a handful of demonstrations. We ablate our model's architectural design choices, such as 3D scene featurization and 3D relative attentions, and show they all help generalization. Our results suggest that 3D scene representations and powerful generative modeling are keys to efficient robot learning from demonstrations.
- [1696] arXiv:2402.10890 (cross-list from cs.CL) [ pdf , ps , other ]
-
Title: When is Tree Search Useful for LLM Planning? It Depends on the DiscriminatorSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: In this paper, we examine how large language models (LLMs) solve multi-step problems under a language agent framework with three components: a generator, a discriminator, and a planning method. We investigate the practical utility of two advanced planning methods, iterative correction and tree search. We present a comprehensive analysis of how discrimination accuracy affects the overall performance of agents when using these two methods or a simpler method, re-ranking. Experiments on two tasks, text-to-SQL parsing and mathematical reasoning, show that: (1) advanced planning methods demand discriminators with at least 90% accuracy to achieve significant improvements over re-ranking; (2) current LLMs' discrimination abilities have not met the needs of advanced planning methods to achieve such improvements; (3) with LLM-based discriminators, advanced planning methods may not adequately balance accuracy and efficiency. For example, compared to the other two methods, tree search is at least 10--20 times slower but leads to negligible performance gains, which hinders its real-world applications. Code and data will be released at this https URL .
- [1697] arXiv:2402.10891 (cross-list from cs.CL) [ pdf , ps , other ]
-
Title: Instruction Diversity Drives Generalization To Unseen TasksSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: Instruction tuning -- fine-tuning a large language model (LLM) on pairs of instructions and desired outcomes -- is an approach that enables pre-trained language models to perform real-world tasks and follow human instructions. Its practical success depends on the model learning a broader set of instructions than those it was trained on. Yet the factors that determine model generalization to such \emph{unseen tasks} are not well understood. %To understand the driving factors of generalization, In this paper, we experiment with string rewrites, a symbolic task that serves as a building block for Turing complete Markov algorithms while allowing experimental control of "inputs" and "instructions". We investigate the trade-off between the number of instructions the model is trained on and the number of training samples provided for each instruction and observe that the diversity of the instruction set determines generalization. Generalization emerges once a diverse enough set of tasks is provided, even though very few examples are provided for each task. Instruction diversity also ensures robustness with respect to non-uniform distributions of instructions in the training set.
- [1698] arXiv:2402.10893 (cross-list from cs.LG) [ pdf , ps , html , other ]
-
Title: RLVF: Learning from Verbal Feedback without OvergeneralizationMoritz Stephan , Alexander Khazatsky , Eric Mitchell , Annie S Chen , Sheryl Hsu , Archit Sharma , Chelsea FinnComments: 9 pages, 9 figuresSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
Abstract: The diversity of contexts in which large language models (LLMs) are deployed requires the ability to modify or customize default model behaviors to incorporate nuanced requirements and preferences. A convenient interface to specify such model adjustments is high-level verbal feedback, such as "Don't use emojis when drafting emails to my boss." However, while writing high-level feedback is far simpler than collecting annotations for reinforcement learning from human feedback (RLHF), we find that simply prompting a model with such feedback leads to overgeneralization of the feedback to contexts where it is not relevant. We study the problem of incorporating verbal feedback without such overgeneralization, inspiring a new method Contextualized Critiques with Constrained Preference Optimization (C3PO). C3PO uses a piece of high-level feedback to generate a small synthetic preference dataset specifying how the feedback should (and should not) be applied. It then fine-tunes the model in accordance with the synthetic preference data while minimizing the divergence from the original model for prompts where the feedback does not apply. Our experimental results indicate that our approach effectively applies verbal feedback to relevant scenarios while preserving existing behaviors for other contexts. For both human- and GPT-4-generated high-level feedback, C3PO effectively adheres to the given feedback comparably to in-context baselines while reducing overgeneralization by 30%.
- [1699] arXiv:2402.10908 (cross-list from cs.CL) [ pdf , ps , html , other ]
-
Title: LLM-Assisted Crisis Management: Building Advanced LLM Platforms for Effective Emergency Response and Public CollaborationSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI); Human-Computer Interaction (cs.HC); Machine Learning (cs.LG)
Abstract: Emergencies and critical incidents often unfold rapidly, necessitating a swift and effective response. In this research, we introduce a novel approach to identify and classify emergency situations from social media posts and direct emergency messages using an open source Large Language Model, LLAMA2. The goal is to harness the power of natural language processing and machine learning to assist public safety telecommunicators and huge crowds during countrywide emergencies. Our research focuses on developing a language model that can understand users describe their situation in the 911 call, enabling LLAMA2 to analyze the content and offer relevant instructions to the telecommunicator, while also creating workflows to notify government agencies with the caller's information when necessary. Another benefit this language model provides is its ability to assist people during a significant emergency incident when the 911 system is overwhelmed, by assisting the users with simple instructions and informing authorities with their location and emergency information.
- [1700] arXiv:2402.10920 (cross-list from cs.AR) [ pdf , ps , html , other ]
-
Title: Designing Silicon Brains using LLM: Leveraging ChatGPT for Automated Description of a Spiking Neuron ArraySubjects: Hardware Architecture (cs.AR) ; Artificial Intelligence (cs.AI); Emerging Technologies (cs.ET)
Abstract: Large language models (LLMs) have made headlines for synthesizing correct-sounding responses to a variety of prompts, including code generation. In this paper, we present the prompts used to guide ChatGPT4 to produce a synthesizable and functional verilog description for the entirety of a programmable Spiking Neuron Array ASIC. This design flow showcases the current state of using ChatGPT4 for natural language driven hardware design. The AI-generated design was verified in simulation using handcrafted testbenches and has been submitted for fabrication in Skywater 130nm through Tiny Tapeout 5 using an open-source EDA flow.
- [1701] arXiv:2402.10930 (cross-list from cs.AR) [ pdf , ps , html , other ]
-
Title: ConSmax: Hardware-Friendly Alternative Softmax with Learnable ParametersShiwei Liu , Guanchen Tao , Yifei Zou , Derek Chow , Zichen Fan , Kauna Lei , Bangfei Pan , Dennis Sylvester , Gregory Kielian , Mehdi SaliganeSubjects: Hardware Architecture (cs.AR) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: The self-attention mechanism sets transformer-based large language model (LLM) apart from the convolutional and recurrent neural networks. Despite the performance improvement, achieving real-time LLM inference on silicon is challenging due to the extensively used Softmax in self-attention. Apart from the non-linearity, the low arithmetic intensity greatly reduces the processing parallelism, which becomes the bottleneck especially when dealing with a longer context. To address this challenge, we propose Constant Softmax (ConSmax), a software-hardware co-design as an efficient Softmax alternative. ConSmax employs differentiable normalization parameters to remove the maximum searching and denominator summation in Softmax. It allows for massive parallelization while performing the critical tasks of Softmax. In addition, a scalable ConSmax hardware utilizing a bitwidth-split look-up table (LUT) can produce lossless non-linear operation and support mix-precision computing. It further facilitates efficient LLM inference. Experimental results show that ConSmax achieves a minuscule power consumption of 0.43 mW and area of 0.001 mm2 at 1-GHz working frequency and 22-nm CMOS technology. Compared to state-of-the-art Softmax hardware, ConSmax results in 14.5x energy and 14.0x area savings with a comparable accuracy on a GPT-2 model and the WikiText103 dataset.
- [1702] arXiv:2402.10937 (cross-list from cs.AR) [ pdf , ps , other ]
-
Title: A Lightweight Inception Boosted U-Net Neural Network for Routability PredictionComments: The paper is submitted to the International Symposium of EDA (2024, XiAn, China)Subjects: Hardware Architecture (cs.AR) ; Artificial Intelligence (cs.AI); Computational Engineering, Finance, and Science (cs.CE); Computer Science and Game Theory (cs.GT); Machine Learning (cs.LG)
Abstract: As the modern CPU, GPU, and NPU chip design complexity and transistor counts keep increasing, and with the relentless shrinking of semiconductor technology nodes to nearly 1 nanometer, the placement and routing have gradually become the two most pivotal processes in modern very-large-scale-integrated (VLSI) circuit back-end design. How to evaluate routability efficiently and accurately in advance (at the placement and global routing stages) has grown into a crucial research area in the field of artificial intelligence (AI) assisted electronic design automation (EDA). In this paper, we propose a novel U-Net variant model boosted by an Inception embedded module to predict Routing Congestion (RC) and Design Rule Checking (DRC) hotspots. Experimental results on the recently published CircuitNet dataset benchmark show that our proposed method achieves up to 5% (RC) and 20% (DRC) rate reduction in terms of Avg-NRMSE (Average Normalized Root Mean Square Error) compared to the classic architecture. Furthermore, our approach consistently outperforms the prior model on the SSIM (Structural Similarity Index Measure) metric.
- [1703] arXiv:2402.10940 (cross-list from cs.CL) [ pdf , ps , html , other ]
-
Title: Neural machine translation of clinical procedure codes for medical diagnosis and uncertainty quantificationSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: A Clinical Decision Support System (CDSS) is designed to enhance clinician decision-making by combining system-generated recommendations with medical expertise. Given the high costs, intensive labor, and time-sensitive nature of medical treatments, there is a pressing need for efficient decision support, especially in complex emergency scenarios. In these scenarios, where information can be limited, an advanced CDSS framework that leverages AI (artificial intelligence) models to effectively reduce diagnostic uncertainty has utility. Such an AI-enabled CDSS framework with quantified uncertainty promises to be practical and beneficial in the demanding context of real-world medical care. In this study, we introduce the concept of Medical Entropy, quantifying uncertainties in patient outcomes predicted by neural machine translation based on the ICD-9 code of procedures. Our experimental results not only show strong correlations between procedure and diagnosis sequences based on the simple ICD-9 code but also demonstrate the promising capacity to model trends of uncertainties during hospitalizations through a data-driven approach.
- [1704] arXiv:2402.10941 (cross-list from cs.CL) [ pdf , ps , html , other ]
-
Title: Text2Data: Low-Resource Data Generation with Textual ControlShiyu Wang , Yihao Feng , Tian Lan , Ning Yu , Yu Bai , Ran Xu , Huan Wang , Caiming Xiong , Silvio SavareseComments: We propose a method that can achieve text-to-data generation under low-resource situationSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: Natural language serves as a common and straightforward control signal for humans to interact seamlessly with machines. Recognizing the importance of this interface, the machine learning community is investing considerable effort in generating data that is semantically coherent with textual instructions. While strides have been made in text-to-data generation spanning image editing, audio synthesis, video creation, and beyond, low-resource areas characterized by expensive annotations or complex data structures, such as molecules, motion dynamics, and time series, often lack textual labels. This deficiency impedes supervised learning, thereby constraining the application of advanced generative models for text-to-data tasks. In response to these challenges in the low-resource scenario, we propose Text2Data, a novel approach that utilizes unlabeled data to understand the underlying data distribution through an unsupervised diffusion model. Subsequently, it undergoes controllable finetuning via a novel constraint optimization-based learning objective that ensures controllability and effectively counteracts catastrophic forgetting. Comprehensive experiments demonstrate that Text2Data is able to achieve enhanced performance regarding controllability across various modalities, including molecules, motions and time series, when compared to existing baselines.
- [1705] arXiv:2402.10946 (cross-list from cs.CL) [ pdf , ps , html , other ]
-
Title: CultureLLM: Incorporating Cultural Differences into Large Language ModelsComments: Technical Report; 26 pagesSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: Large language models (LLMs) are reported to be partial to certain cultures owing to the training data dominance from the English corpora. Since multilingual cultural data are often expensive to collect, existing efforts handle this by prompt engineering or culture-specific pre-training. However, they might overlook the knowledge deficiency of low-resource culture and require extensive computing resources. In this paper, we propose CultureLLM, a cost-effective solution to incorporate cultural differences into LLMs. CultureLLM adopts World Value Survey (WVS) as seed data and generates semantically equivalent training data via the proposed semantic data augmentation. Using only 50 seed samples from WVS with augmented data, we fine-tune culture-specific LLMs and one unified model (CultureLLM-One) for 9 cultures covering rich and low-resource languages. Extensive experiments on 60 culture-related datasets demonstrate that CultureLLM significantly outperforms various counterparts such as GPT-3.5 (by 8.1%) and Gemini Pro (by 9.5%) with comparable performance to GPT-4 or even better. Our human study shows that the generated samples are semantically equivalent to the original samples, providing an effective solution for LLMs augmentation.
- [1706] arXiv:2402.10948 (cross-list from cs.CL) [ pdf , ps , html , other ]
-
Title: Zero-shot Explainable Mental Health Analysis on Social Media by Incorporating Mental ScalesComments: 4 pages,2 figuresJournal-ref: The Web Conference (WWW) 2024, Short PaperSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI)
Abstract: Traditional discriminative approaches in mental health analysis are known for their strong capacity but lack interpretability and demand large-scale annotated data. The generative approaches, such as those based on large language models (LLMs), have the potential to get rid of heavy annotations and provide explanations but their capabilities still fall short compared to discriminative approaches, and their explanations may be unreliable due to the fact that the generation of explanation is a black-box process. Inspired by the psychological assessment practice of using scales to evaluate mental states, our method which is called Mental Analysis by Incorporating Mental Scales (MAIMS), incorporates two procedures via LLMs. First, the patient completes mental scales, and second, the psychologist interprets the collected information from the mental scales and makes informed decisions. Experimental results show that MAIMS outperforms other zero-shot methods. MAIMS can generate more rigorous explanation based on the outputs of mental scales
- [1707] arXiv:2402.10949 (cross-list from cs.CL) [ pdf , ps , html , other ]
-
Title: The Unreasonable Effectiveness of Eccentric Automatic PromptsSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: Large Language Models (LLMs) have demonstrated remarkable problem-solving and basic mathematics abilities. However, their efficacy is highly contingent on the formulation of the prompt. This study endeavors to quantify the influence of incorporating "positive thinking" into the system message of the prompt, then compare that to systematic prompt optimization. We assess the performance of 60 combinations of system message snippets, tested with and without Chain of Thought prompting, across three models with parameters ranging from 7 to 70 billion on the GSM8K dataset. Our findings reveal that results do not universally generalize across models. In most instances, the inclusion of "positive thinking" prompts positively affected model performance. Notably, however, Llama2-70B exhibited an exception when not utilizing Chain of Thought, as the optimal system message was found to be none at all. Given the combinatorial complexity, and thus computation time, of experimenting with hand-tuning prompts for large black-box models, we then compared the performance of the best "positive thinking" prompt against the output of systematic prompt optimization. We show that employing an automated prompt optimizer emerges as the most effective method for enhancing performance, even when working with smaller open-source models. Additionally, our findings reveal that the highest-scoring, automatically-optimized prompt exhibits a degree of peculiarity far beyond expectations.
- [1708] arXiv:2402.10958 (cross-list from cs.CL) [ pdf , ps , html , other ]
-
Title: Relative Preference Optimization: Enhancing LLM Alignment through Contrasting Responses across Identical and Diverse PromptsSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: In the field of large language models (LLMs), aligning models with the diverse preferences of users is a critical challenge. Direct Preference Optimization (DPO) has played a key role in this area. It works by using pairs of preferences derived from the same prompts, and it functions without needing an additional reward model. However, DPO does not fully reflect the complex nature of human learning, which often involves understanding contrasting responses to not only identical but also similar questions. To overcome this shortfall, we propose Relative Preference Optimization (RPO). RPO is designed to discern between more and less preferred responses derived from both identical and related prompts. It introduces a contrastive weighting mechanism, enabling the tuning of LLMs using a broader range of preference data, including both paired and unpaired sets. This approach expands the learning capabilities of the model, allowing it to leverage insights from a more varied set of prompts. Through empirical tests, including dialogue and summarization tasks, and evaluations using the AlpacaEval2.0 leaderboard, RPO has demonstrated a superior ability to align LLMs with user preferences and to improve their adaptability during the training process. The PyTorch code necessary to reproduce the results presented in the paper will be made available on GitHub for public access.
- [1709] arXiv:2402.10962 (cross-list from cs.CL) [ pdf , ps , html , other ]
-
Title: Measuring and Controlling Instruction (In)Stability in Language Model DialogsKenneth Li , Tianle Liu , Naomi Bashkansky , David Bau , Fernanda Viégas , Hanspeter Pfister , Martin WattenbergComments: Code: this https URLSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: System-prompting is a standard tool for customizing language-model chatbots, enabling them to follow a specific instruction. An implicit assumption in the use of system prompts is that they will be stable, so the chatbot will continue to generate text according to the stipulated instructions for the duration of a conversation. We propose a quantitative benchmark to test this assumption, evaluating instruction stability via self-chats between two instructed chatbots. Testing popular models like LLaMA2-chat-70B and GPT-3.5, we reveal a significant instruction drift within eight rounds of conversations. An empirical and theoretical analysis of this phenomenon suggests the transformer attention mechanism plays a role, due to attention decay over long exchanges. To combat attention decay and instruction drift, we propose a lightweight method called split-softmax, which compares favorably against two strong baselines.
- [1710] arXiv:2402.10977 (cross-list from cs.LG) [ pdf , ps , other ]
-
Title: Generative AI and Process Systems Engineering: The Next FrontierSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Systems and Control (eess.SY); Optimization and Control (math.OC)
Abstract: This article explores how emerging generative artificial intelligence (GenAI) models, such as large language models (LLMs), can enhance solution methodologies within process systems engineering (PSE). These cutting-edge GenAI models, particularly foundation models (FMs), which are pre-trained on extensive, general-purpose datasets, offer versatile adaptability for a broad range of tasks, including responding to queries, image generation, and complex decision-making. Given the close relationship between advancements in PSE and developments in computing and systems technologies, exploring the synergy between GenAI and PSE is essential. We begin our discussion with a compact overview of both classic and emerging GenAI models, including FMs, and then dive into their applications within key PSE domains: synthesis and design, optimization and integration, and process monitoring and control. In each domain, we explore how GenAI models could potentially advance PSE methodologies, providing insights and prospects for each area. Furthermore, the article identifies and discusses potential challenges in fully leveraging GenAI within PSE, including multiscale modeling, data requirements, evaluation metrics and benchmarks, and trust and safety, thereby deepening the discourse on effective GenAI integration into systems analysis, design, optimization, operations, monitoring, and control. This paper provides a guide for future research focused on the applications of emerging GenAI in PSE.
- [1711] arXiv:2402.10978 (cross-list from cs.LG) [ pdf , ps , html , other ]
-
Title: Language Models with Conformal Factuality GuaranteesSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
Abstract: Guaranteeing the correctness and factuality of language model (LM) outputs is a major open problem. In this work, we propose conformal factuality, a framework that can ensure high probability correctness guarantees for LMs by connecting language modeling and conformal prediction. We observe that the correctness of an LM output is equivalent to an uncertainty quantification problem, where the uncertainty sets are defined as the entailment set of an LM's output. Using this connection, we show that conformal prediction in language models corresponds to a back-off algorithm that provides high probability correctness guarantees by progressively making LM outputs less specific (and expanding the associated uncertainty sets). This approach applies to any black-box LM and requires very few human-annotated samples. Evaluations of our approach on closed book QA (FActScore, NaturalQuestions) and reasoning tasks (MATH) show that our approach can provide 80-90% correctness guarantees while retaining the majority of the LM's original output.
- [1712] arXiv:2402.10979 (cross-list from cs.CL) [ pdf , ps , html , other ]
-
Title: SportsMetrics: Blending Text and Numerical Data to Understand Information Fusion in LLMsSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI)
Abstract: Large language models hold significant potential for integrating various data types, such as text documents and database records, for advanced analytics. However, blending text and numerical data presents substantial challenges. LLMs need to process and cross-reference entities and numbers, handle data inconsistencies and redundancies, and develop planning capabilities such as building a working memory for managing complex data queries. In this paper, we introduce four novel tasks centered around sports data analytics to evaluate the numerical reasoning and information fusion capabilities of LLMs. These tasks involve providing LLMs with detailed, play-by-play sports game descriptions, then challenging them with adversarial scenarios such as new game rules, longer durations, scrambled narratives, and analyzing key statistics in game summaries. We conduct extensive experiments on NBA and NFL games to assess the performance of LLMs on these tasks. Our benchmark, SportsMetrics, introduces a new mechanism for assessing LLMs' numerical reasoning and fusion skills.
- [1713] arXiv:2402.10980 (cross-list from physics.chem-ph) [ pdf , ps , html , other ]
-
Title: ChemReasoner: Heuristic Search over a Large Language Model's Knowledge Space using Quantum-Chemical FeedbackHenry W. Sprueill , Carl Edwards , Khushbu Agarwal , Mariefel V. Olarte , Udishnu Sanyal , Conrad Johnston , Hongbin Liu , Heng Ji , Sutanay ChoudhuryComments: 8 pages, accepted by ICML 2024Subjects: Chemical Physics (physics.chem-ph) ; Artificial Intelligence (cs.AI); Computational Engineering, Finance, and Science (cs.CE); Machine Learning (cs.LG)
Abstract: The discovery of new catalysts is essential for the design of new and more efficient chemical processes in order to transition to a sustainable future. We introduce an AI-guided computational screening framework unifying linguistic reasoning with quantum-chemistry based feedback from 3D atomistic representations. Our approach formulates catalyst discovery as an uncertain environment where an agent actively searches for highly effective catalysts via the iterative combination of large language model (LLM)-derived hypotheses and atomistic graph neural network (GNN)-derived feedback. Identified catalysts in intermediate search steps undergo structural evaluation based on spatial orientation, reaction pathways, and stability. Scoring functions based on adsorption energies and barriers steer the exploration in the LLM's knowledge space toward energetically favorable, high-efficiency catalysts. We introduce planning methods that automatically guide the exploration without human input, providing competitive performance against expert-enumerated chemical descriptor-based implementations. By integrating language-guided reasoning with computational chemistry feedback, our work pioneers AI-accelerated, trustworthy catalyst discovery.
- [1714] arXiv:2402.10985 (cross-list from cs.CR) [ pdf , ps , html , other ]
-
Title: Leveraging AI Planning For Detecting Cloud Security VulnerabilitiesSubjects: Cryptography and Security (cs.CR) ; Artificial Intelligence (cs.AI)
Abstract: Cloud computing services provide scalable and cost-effective solutions for data storage, processing, and collaboration. Alongside their growing popularity, concerns related to their security vulnerabilities leading to data breaches and sophisticated attacks such as ransomware are growing. To address these, first, we propose a generic framework to express relations between different cloud objects such as users, datastores, security roles, to model access control policies in cloud systems. Access control misconfigurations are often the primary driver for cloud attacks. Second, we develop a PDDL model for detecting security vulnerabilities which can for example lead to widespread attacks such as ransomware, sensitive data exfiltration among others. A planner can then generate attacks to identify such vulnerabilities in the cloud. Finally, we test our approach on 14 real Amazon AWS cloud configurations of different commercial organizations. Our system can identify a broad range of security vulnerabilities, which state-of-the-art industry tools cannot detect.
- [1715] arXiv:2402.10986 (cross-list from cs.CL) [ pdf , ps , html , other ]
-
Title: FinTral: A Family of GPT-4 Level Multimodal Financial Large Language ModelsComments: Submitted to ACL 2024 (under review)Subjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI)
Abstract: We introduce FinTral, a suite of state-of-the-art multimodal large language models (LLMs) built upon the Mistral-7b model and tailored for financial analysis. FinTral integrates textual, numerical, tabular, and image data. We enhance FinTral with domain-specific pretraining, instruction fine-tuning, and RLAIF training by exploiting a large collection of textual and visual datasets we curate for this work. We also introduce an extensive benchmark featuring nine tasks and 25 datasets for evaluation, including hallucinations in the financial domain. Our FinTral model trained with direct preference optimization employing advanced Tools and Retrieval methods, dubbed FinTral-DPO-T&R, demonstrates an exceptional zero-shot performance. It outperforms ChatGPT-3.5 in all tasks and surpasses GPT-4 in five out of nine tasks, marking a significant advancement in AI-driven financial technology. We also demonstrate that FinTral has the potential to excel in real-time analysis and decision-making in diverse financial contexts.
- [1716] arXiv:2402.10987 (cross-list from cs.CL) [ pdf , ps , html , other ]
-
Title: WilKE: Wise-Layer Knowledge Editor for Lifelong Knowledge EditingSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI)
Abstract: Knowledge editing aims to rectify inaccuracies in large language models (LLMs) without costly retraining for outdated or erroneous knowledge. However, current knowledge editing methods primarily focus on single editing, failing to meet the requirements for lifelong editing. In this paper, lifelong editing is synonymous with lifelong knowledge editing. This study reveals a performance degradation encountered by knowledge editing in lifelong editing, characterized by toxicity buildup and toxicity flash, with the primary cause identified as pattern unmatch. We introduce a knowledge editing approach named WilKE, which selects editing layer based on the pattern matching degree of editing knowledge across different layers. Experimental results demonstrate that, in lifelong editing, WilKE exhibits an average improvement of 46.2\% and 67.8\% on editing GPT2-XL and GPT-J relative to state-of-the-art knowledge editing methods.
- [1717] arXiv:2402.10991 (cross-list from cs.LG) [ pdf , ps , other ]
-
Title: Enhancing Convergence in Federated Learning: A Contribution-Aware Asynchronous ApproachComments: 5 pages, 1 figuresSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Abstract: Federated Learning (FL) is a distributed machine learning paradigm that allows clients to train models on their data while preserving their privacy. FL algorithms, such as Federated Averaging (FedAvg) and its variants, have been shown to converge well in many scenarios. However, these methods require clients to upload their local updates to the server in a synchronous manner, which can be slow and unreliable in realistic FL settings. To address this issue, researchers have developed asynchronous FL methods that allow clients to continue training on their local data using a stale global model. However, most of these methods simply aggregate all of the received updates without considering their relative contributions, which can slow down convergence. In this paper, we propose a contribution-aware asynchronous FL method that takes into account the staleness and statistical heterogeneity of the received updates. Our method dynamically adjusts the contribution of each update based on these factors, which can speed up convergence compared to existing methods.
- [1718] arXiv:2402.10992 (cross-list from cs.CL) [ pdf , ps , other ]
-
Title: "Understanding AI": Semantic Grounding in Large Language ModelsSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI)
Abstract: Do LLMs understand the meaning of the texts they generate? Do they possess a semantic grounding? And how could we understand whether and what they understand? I start the paper with the observation that we have recently witnessed a generative turn in AI, since generative models, including LLMs, are key for self-supervised learning. To assess the question of semantic grounding, I distinguish and discuss five methodological ways. The most promising way is to apply core assumptions of theories of meaning in philosophy of mind and language to LLMs. Grounding proves to be a gradual affair with a three-dimensional distinction between functional, social and causal grounding. LLMs show basic evidence in all three dimensions. A strong argument is that LLMs develop world models. Hence, LLMs are neither stochastic parrots nor semantic zombies, but already understand the language they generate, at least in an elementary sense.
- [1719] arXiv:2402.10998 (cross-list from cs.SY) [ pdf , ps , other ]
-
Title: Provably Safe Neural Network Controllers via Differential Dynamic LogicComments: 42 pages (main paper has 19 pages), 12 figuresSubjects: Systems and Control (eess.SY) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Logic in Computer Science (cs.LO)
Abstract: While neural networks (NNs) have a large potential as goal-oriented controllers for Cyber-Physical Systems, verifying the safety of neural network based control systems (NNCSs) poses significant challenges for the practical use of NNs -- especially when safety is needed for unbounded time horizons. One reason for this is the intractability of NN and hybrid system analysis. We introduce VerSAILLE (Verifiably Safe AI via Logically Linked Envelopes): The first approach for the combination of differential dynamic logic (dL) and NN verification. By joining forces, we can exploit the efficiency of NN verification tools while retaining the rigor of dL. We reflect a safety proof for a controller envelope in an NN to prove the safety of concrete NNCS on an infinite-time horizon. The NN verification properties resulting from VerSAILLE typically require nonlinear arithmetic while efficient NN verification tools merely support linear arithmetic. To overcome this divide, we present Mosaic: The first sound and complete verification approach for polynomial real arithmetic properties on piece-wise linear NNs. Mosaic lifts off-the-shelf tools for linear properties to the nonlinear setting. An evaluation on case studies, including adaptive cruise control and airborne collision avoidance, demonstrates the versatility of VerSAILLE and Mosaic: It supports the certification of infinite-time horizon safety and the exhaustive enumeration of counterexample regions while significantly outperforming State-of-the-Art tools in closed-loop NNV.
- [1720] arXiv:2402.10999 (cross-list from cs.LG) [ pdf , ps , other ]
-
Title: Analysis and Mortality Prediction using Multiclass Classification for Older Adults with Type 2 DiabetesComments: 146 PagesSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Abstract: Designing proper treatment plans to manage diabetes requires health practitioners to pay heed to the individuals remaining life along with the comorbidities affecting them. Older adults with Type 2 Diabetes Mellitus (T2DM) are prone to experience premature death or even hypoglycaemia. The structured dataset utilized has 68 potential mortality predictors for 275,190 diabetic U.S. military Veterans aged 65 years or older. A new target variable is invented by combining the two original target variables. Outliers are handled by discretizing the continuous variables. Categorical variables have been dummy encoded. Class balancing is achieved by random under-sampling. A benchmark regression model is built using Multinomial Logistic Regression with LASSO. Chi-Squared and Information Gain are the filter-based feature selection techniques utilized. Classifiers such as Multinomial Logistic Regression, Random Forest, Extreme Gradient Boosting (XGBoost), and One-vs-Rest classifier are employed to build various models. Contrary to expectations, all the models have constantly underperformed. XGBoost has given the highest accuracy of 53.03 percent with Chi-Squared feature selection. All the models have consistently shown an acceptable performance for Class 3 (remaining life is more than 10 years), significantly low for Class 1 (remaining life is up to 5 years), and the worst for Class 2 (remaining life is more than 5 but up to 10 years). Features analysis has deduced that almost all input variables are associated with multiple target classes. The high dimensionality of the input data after dummy encoding seems to have confused the models, leading to misclassifications. The approach taken in this study is ineffective in producing a high-performing predictive model but lays a foundation as this problem has never been viewed from a multiclass classification perspective.
- [1721] arXiv:2402.11000 (cross-list from cs.CL) [ pdf , ps , html , other ]
-
Title: ASGEA: Exploiting Logic Rules from Align-Subgraphs for Entity AlignmentComments: Ongoing work; 16 pages, 9 Tables, 8 Figures; Code: this https URLSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI)
Abstract: Entity alignment (EA) aims to identify entities across different knowledge graphs that represent the same real-world objects. Recent embedding-based EA methods have achieved state-of-the-art performance in EA yet faced interpretability challenges as they purely rely on the embedding distance and neglect the logic rules behind a pair of aligned entities. In this paper, we propose the Align-Subgraph Entity Alignment (ASGEA) framework to exploit logic rules from Align-Subgraphs. ASGEA uses anchor links as bridges to construct Align-Subgraphs and spreads along the paths across KGs, which distinguishes it from the embedding-based methods. Furthermore, we design an interpretable Path-based Graph Neural Network, ASGNN, to effectively identify and integrate the logic rules across KGs. We also introduce a node-level multi-modal attention mechanism coupled with multi-modal enriched anchors to augment the Align-Subgraph. Our experimental results demonstrate the superior performance of ASGEA over the existing embedding-based methods in both EA and Multi-Modal EA (MMEA) tasks.
- [1722] arXiv:2402.11005 (cross-list from cs.CL) [ pdf , ps , html , other ]
-
Title: Exploring Value Biases: How LLMs Deviate Towards the IdealSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI)
Abstract: Large-Language-Models (LLMs) are deployed in a wide range of applications, and their response has an increasing social impact. Understanding the non-deliberate(ive) mechanism of LLMs in giving responses is essential in explaining their performance and discerning their biases in real-world applications. This is analogous to human studies, where such inadvertent responses are referred to as sampling. We study this sampling of LLMs in light of value bias and show that the sampling of LLMs tends to favour high-value options. Value bias corresponds to this shift of response from the most likely towards an ideal value represented in the LLM. In fact, this effect can be reproduced even with new entities learnt via in-context prompting. We show that this bias manifests in unexpected places and has implications on relevant application scenarios, like choosing exemplars. The results show that value bias is strong in LLMs across different categories, similar to the results found in human studies.
- [1723] arXiv:2402.11051 (cross-list from cs.CL) [ pdf , ps , html , other ]
-
Title: Large Language Models Fall Short: Understanding Complex Relationships in Detective NarrativesSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI)
Abstract: Existing datasets for narrative understanding often fail to represent the complexity and uncertainty of relationships in real-life social scenarios. To address this gap, we introduce a new benchmark, Conan, designed for extracting and analysing intricate character relation graphs from detective narratives. Specifically, we designed hierarchical relationship categories and manually extracted and annotated role-oriented relationships from the perspectives of various characters, incorporating both public relationships known to most characters and secret ones known to only a few. Our experiments with advanced Large Language Models (LLMs) like GPT-3.5, GPT-4, and Llama2 reveal their limitations in inferencing complex relationships and handling longer narratives. The combination of the Conan dataset and our pipeline strategy is geared towards understanding the ability of LLMs to comprehend nuanced relational dynamics in narrative contexts.
- [1724] arXiv:2402.11060 (cross-list from cs.CL) [ pdf , ps , html , other ]
-
Title: Persona-DB: Efficient Large Language Model Personalization for Response Prediction with Collaborative Data RefinementChenkai Sun , Ke Yang , Revanth Gangi Reddy , Yi R. Fung , Hou Pong Chan , ChengXiang Zhai , Heng JiSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI); Information Retrieval (cs.IR)
Abstract: The increasing demand for personalized interactions with large language models (LLMs) calls for the development of methodologies capable of accurately and efficiently identifying user opinions and preferences. Retrieval augmentation emerges as an effective strategy, as it can accommodate a vast number of users without the costs from fine-tuning. Existing research, however, has largely focused on enhancing the retrieval stage and devoted limited exploration toward optimizing the representation of the database, a crucial aspect for tasks such as personalization. In this work, we examine the problem from a novel angle, focusing on how data can be better represented for more efficient retrieval in the context of LLM customization. To tackle this challenge, we introduce Persona-DB, a simple yet effective framework consisting of a hierarchical construction process to improve generalization across task contexts and collaborative refinement to effectively bridge knowledge gaps among users. In the task of response forecasting, Persona-DB demonstrates superior efficiency in maintaining accuracy with a significantly reduced retrieval size, a critical advantage in scenarios with extensive histories or limited context windows. Our experiments also indicate a marked improvement of over 15% under cold-start scenarios, when users have extremely sparse data. Furthermore, our analysis reveals the increasing importance of collaborative knowledge as the retrieval capacity expands.
- [1725] arXiv:2402.11068 (cross-list from cs.CL) [ pdf , ps , html , other ]
-
Title: Bridging Causal Discovery and Large Language Models: A Comprehensive Survey of Integrative Approaches and Future DirectionsSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI)
Abstract: Causal discovery (CD) and Large Language Models (LLMs) represent two emerging fields of study with significant implications for artificial intelligence. Despite their distinct origins, CD focuses on uncovering cause-effect relationships from data, and LLMs on processing and generating humanlike text, the convergence of these domains offers novel insights and methodologies for understanding complex systems. This paper presents a comprehensive survey of the integration of LLMs, such as GPT4, into CD tasks. We systematically review and compare existing approaches that leverage LLMs for various CD tasks and highlight their innovative use of metadata and natural language to infer causal structures. Our analysis reveals the strengths and potential of LLMs in both enhancing traditional CD methods and as an imperfect expert, alongside the challenges and limitations inherent in current practices. Furthermore, we identify gaps in the literature and propose future research directions aimed at harnessing the full potential of LLMs in causality research. To our knowledge, this is the first survey to offer a unified and detailed examination of the synergy between LLMs and CD, setting the stage for future advancements in the field.
- [1726] arXiv:2402.11073 (cross-list from cs.CL) [ pdf , ps , html , other ]
-
Title: AFaCTA: Assisting the Annotation of Factual Claim Detection with Reliable LLM AnnotatorsSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI)
Abstract: With the rise of generative AI, automated fact-checking methods to combat misinformation are becoming more and more important. However, factual claim detection, the first step in a fact-checking pipeline, suffers from two key issues that limit its scalability and generalizability: (1) inconsistency in definitions of the task and what a claim is, and (2) the high cost of manual annotation. To address (1), we review the definitions in related work and propose a unifying definition of factual claims that focuses on verifiability. To address (2), we introduce AFaCTA (Automatic Factual Claim deTection Annotator), a novel framework that assists in the annotation of factual claims with the help of large language models (LLMs). AFaCTA calibrates its annotation confidence with consistency along three predefined reasoning paths. Extensive evaluation and experiments in the domain of political speech reveal that AFaCTA can efficiently assist experts in annotating factual claims and training high-quality classifiers, and can work with or without expert supervision. Our analyses also result in PoliClaim, a comprehensive claim detection dataset spanning diverse political topics.
- [1727] arXiv:2402.11078 (cross-list from cs.LG) [ pdf , ps , html , other ]
-
Title: Model Editing by Pure Fine-TuningSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
Abstract: Fine-tuning is dismissed as not effective for model editing due to its poor performance compared to more specialized methods. However, fine-tuning is simple, agnostic to the architectural details of the model being edited, and able to leverage ongoing advances in standard training methods (e.g., PEFT), making it an appealing choice for a model editor. In this work, we show that pure fine-tuning can be a viable approach to model editing. We propose a slight modification of naive fine-tuning with two key ingredients. First, we optimize the conditional likelihood rather than the full likelihood. Second, we augment the data with random paraphrases and facts to encourage generalization and locality. Our experiments on ZsRE and CounterFact show that this simple modification allows fine-tuning to often match or outperform specialized editors in the edit score.
- [1728] arXiv:2402.11082 (cross-list from cs.CR) [ pdf , ps , html , other ]
-
Title: The AI Security Pyramid of PainComments: SPIE DCS 2024Subjects: Cryptography and Security (cs.CR) ; Artificial Intelligence (cs.AI)
Abstract: We introduce the AI Security Pyramid of Pain, a framework that adapts the cybersecurity Pyramid of Pain to categorize and prioritize AI-specific threats. This framework provides a structured approach to understanding and addressing various levels of AI threats. Starting at the base, the pyramid emphasizes Data Integrity, which is essential for the accuracy and reliability of datasets and AI models, including their weights and parameters. Ensuring data integrity is crucial, as it underpins the effectiveness of all AI-driven decisions and operations. The next level, AI System Performance, focuses on MLOps-driven metrics such as model drift, accuracy, and false positive rates. These metrics are crucial for detecting potential security breaches, allowing for early intervention and maintenance of AI system integrity. Advancing further, the pyramid addresses the threat posed by Adversarial Tools, identifying and neutralizing tools used by adversaries to target AI systems. This layer is key to staying ahead of evolving attack methodologies. At the Adversarial Input layer, the framework addresses the detection and mitigation of inputs designed to deceive or exploit AI models. This includes techniques like adversarial patterns and prompt injection attacks, which are increasingly used in sophisticated attacks on AI systems. Data Provenance is the next critical layer, ensuring the authenticity and lineage of data and models. This layer is pivotal in preventing the use of compromised or biased data in AI systems. At the apex is the tactics, techniques, and procedures (TTPs) layer, dealing with the most complex and challenging aspects of AI security. This involves a deep understanding and strategic approach to counter advanced AI-targeted attacks, requiring comprehensive knowledge and planning.
- [1729] arXiv:2402.11089 (cross-list from cs.CV) [ pdf , ps , html , other ]
-
Title: The Male CEO and the Female Assistant: Probing Gender Biases in Text-To-Image Models Through Paired Stereotype TestSubjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI); Computers and Society (cs.CY)
Abstract: Recent large-scale Text-To-Image (T2I) models such as DALLE-3 demonstrate great potential in new applications, but also face unprecedented fairness challenges. Prior studies revealed gender biases in single-person image generation, but T2I model applications might require portraying two or more people simultaneously. Potential biases in this setting remain unexplored, leading to fairness-related risks in usage. To study these underlying facets of gender biases in T2I models, we propose a novel Paired Stereotype Test (PST) bias evaluation framework. PST prompts the model to generate two individuals in the same image. They are described with two social identities that are stereotypically associated with the opposite gender. Biases can then be measured by the level of conformation to gender stereotypes in generated images. Using PST, we evaluate DALLE-3 from 2 perspectives: biases in gendered occupation and biases in organizational power. Despite seemingly fair or even anti-stereotype single-person generations, PST still unveils gendered occupational and power associations. Moreover, compared to single-person settings, DALLE-3 generates noticeably more masculine figures under PST for individuals with male-stereotypical identities. PST is therefore effective in revealing underlying gender biases in DALLE-3 that single-person settings cannot capture. Our findings reveal the complicated patterns of gender biases in modern T2I models, further highlighting the critical fairness challenges in multimodal generative systems.
- [1730] arXiv:2402.11104 (cross-list from cs.GT) [ pdf , ps , html , other ]
-
Title: Computing Voting Rules with Elicited Incomplete VotesSubjects: Computer Science and Game Theory (cs.GT) ; Artificial Intelligence (cs.AI)
Abstract: Motivated by the difficulty of specifying complete ordinal preferences over a large set of $m$ candidates, we study voting rules that are computable by querying voters about $t < m$ candidates. Generalizing prior works that focused on specific instances of this problem, our paper fully characterizes the set of positional scoring rules that can be computed for any $1 \leq t < m$, which notably does not include plurality. We then extend this to show a similar impossibility result for single transferable vote (elimination voting). These negative results are information-theoretic and agnostic to the number of queries. Finally, for scoring rules that are computable with limited-sized queries, we give parameterized upper and lower bounds on the number of such queries a deterministic or randomized algorithm must make to determine the score-maximizing candidate. While there is no gap between our bounds for deterministic algorithms, identifying the exact query complexity for randomized algorithms is a challenging open problem, of which we solve one special case.
- [1731] arXiv:2402.11122 (cross-list from cs.CL) [ pdf , ps , html , other ]
-
Title: Navigating the Dual Facets: A Comprehensive Evaluation of Sequential Memory Editing in Large Language ModelsZihao Lin , Mohammad Beigi , Hongxuan Li , Yufan Zhou , Yuxiang Zhang , Qifan Wang , Wenpeng Yin , Lifu HuangComments: preprint, 15 pagesSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI)
Abstract: Memory Editing (ME) has emerged as an efficient method to modify erroneous facts or inject new facts into Large Language Models (LLMs). Two mainstream ME methods exist: parameter-modifying ME and parameter-preserving ME (integrating extra modules while preserving original parameters). Regrettably, previous studies on ME evaluation have two critical limitations: (i) evaluating LLMs with single edit only, neglecting the need for continuous editing, and (ii) evaluations focusing solely on basic factual triples, overlooking broader LLM capabilities like logical reasoning and reading understanding. This study addresses these limitations with contributions threefold: (i) We explore how ME affects a wide range of fundamental capabilities of LLMs under sequential editing. Experimental results reveal an intriguing phenomenon: Most parameter-modifying ME consistently degrade performance across all tasks after a few sequential edits. In contrast, parameter-preserving ME effectively maintains LLMs' fundamental capabilities but struggles to accurately recall edited knowledge presented in a different format. (ii) We extend our evaluation to different editing settings, such as layers to edit, model size, instruction tuning, etc. Experimental findings indicate several strategies that can potentially mitigate the adverse effects of ME. (iii) We further explain why parameter-modifying ME damages LLMs from three dimensions: parameter changes after editing, language modeling capability, and the in-context learning capability. Our in-depth study advocates more careful use of ME in real-world scenarios.
- [1732] arXiv:2402.11123 (cross-list from cs.LG) [ pdf , ps , html , other ]
-
Title: Optimizing Warfarin Dosing Using Contextual Bandit: An Offline Policy Learning and Evaluation MethodSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Abstract: Warfarin, an anticoagulant medication, is formulated to prevent and address conditions associated with abnormal blood clotting, making it one of the most prescribed drugs globally. However, determining the suitable dosage remains challenging due to individual response variations, and prescribing an incorrect dosage may lead to severe consequences. Contextual bandit and reinforcement learning have shown promise in addressing this issue. Given the wide availability of observational data and safety concerns of decision-making in healthcare, we focused on using exclusively observational data from historical policies as demonstrations to derive new policies; we utilized offline policy learning and evaluation in a contextual bandit setting to establish the optimal personalized dosage strategy. Our learned policies surpassed these baseline approaches without genotype inputs, even when given a suboptimal demonstration, showcasing promising application potential.
- [1733] arXiv:2402.11131 (cross-list from cs.CL) [ pdf , ps , html , other ]
-
Title: Speculative Streaming: Fast LLM Inference without Auxiliary ModelsSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: Speculative decoding is a prominent technique to speed up the inference of a large target language model based on predictions of an auxiliary draft model. While effective, in application-specific settings, it often involves fine-tuning both draft and target models to achieve high acceptance rates. As the number of downstream tasks grows, these draft models add significant complexity to inference systems. We propose Speculative Streaming, a single-model speculative decoding method that fuses drafting into the target model by changing the fine-tuning objective from next token prediction to future n-gram prediction. Speculative Streaming speeds up decoding by 1.8 - 3.1X in a diverse set of tasks, such as Summarization, Structured Queries, and Meaning Representation, without sacrificing generation quality. Additionally, Speculative Streaming is parameter-efficient. It achieves on-par/higher speed-ups than Medusa-style architectures while using ~10000X fewer extra parameters, making it well-suited for resource-constrained devices.
- [1734] arXiv:2402.11138 (cross-list from cs.CL) [ pdf , ps , html , other ]
-
Title: Contrastive Instruction TuningTianyi Yan , Fei Wang , James Y. Huang , Wenxuan Zhou , Fan Yin , Aram Galstyan , Wenpeng Yin , Muhao ChenSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: Instruction tuning has been used as a promising approach to improve the performance of large language models (LLMs) on unseen tasks. However, current LLMs exhibit limited robustness to unseen instructions, generating inconsistent outputs when the same instruction is phrased with slightly varied forms or language styles. This behavior indicates LLMs' lack of robustness to textual variations and generalizability to unseen instructions, potentially leading to trustworthiness issues. Accordingly, we propose Contrastive Instruction Tuning, which maximizes the similarity between the hidden representations of semantically equivalent instruction-instance pairs while minimizing the similarity between semantically different ones. To facilitate this approach, we augment the existing FLAN collection by paraphrasing task instructions. Experiments on the PromptBench benchmark show that CoIN consistently improves LLMs' robustness to unseen instructions with variations across character, word, sentence, and semantic levels by an average of +2.5% in accuracy.
- [1735] arXiv:2402.11139 (cross-list from cs.LG) [ pdf , ps , html , other ]
-
Title: LiGNN: Graph Neural Networks at LinkedInFedor Borisyuk , Shihai He , Yunbo Ouyang , Morteza Ramezani , Peng Du , Xiaochen Hou , Chengming Jiang , Nitin Pasumarthy , Priya Bannur , Birjodh Tiwana , Ping Liu , Siddharth Dangi , Daqi Sun , Zhoutao Pei , Xiao Shi , Sirou Zhu , Qianqi Shen , Kuang-Hsuan Lee , David Stein , Baolei Li , Haichao Wei , Amol Ghoting , Souvik GhoshSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Abstract: In this paper, we present LiGNN, a deployed large-scale Graph Neural Networks (GNNs) Framework. We share our insight on developing and deployment of GNNs at large scale at LinkedIn. We present a set of algorithmic improvements to the quality of GNN representation learning including temporal graph architectures with long term losses, effective cold start solutions via graph densification, ID embeddings and multi-hop neighbor sampling. We explain how we built and sped up by 7x our large-scale training on LinkedIn graphs with adaptive sampling of neighbors, grouping and slicing of training data batches, specialized shared-memory queue and local gradient optimization. We summarize our deployment lessons and learnings gathered from A/B test experiments. The techniques presented in this work have contributed to an approximate relative improvements of 1% of Job application hearing back rate, 2% Ads CTR lift, 0.5% of Feed engaged daily active users, 0.2% session lift and 0.1% weekly active user lift from people recommendation. We believe that this work can provide practical solutions and insights for engineers who are interested in applying Graph neural networks at large scale.
- [1736] arXiv:2402.11140 (cross-list from cs.CL) [ pdf , ps , html , other ]
-
Title: Boosting of Thoughts: Trial-and-Error Problem Solving with Large Language ModelsComments: Accepted as a poster paper by ICLR2024. 27 pages, 5 figures, 18 tables. [Source Code]( this https URL )Subjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: The reasoning performance of Large Language Models (LLMs) on a wide range of problems critically relies on chain-of-thought prompting, which involves providing a few chain of thought demonstrations as exemplars in prompts. Recent work, e.g., Tree of Thoughts, has pointed out the importance of exploration and self-evaluation in reasoning step selection for complex problem solving. In this paper, we present Boosting of Thoughts (BoT), an automated prompting framework for problem solving with LLMs by iteratively exploring and self-evaluating many trees of thoughts in order to acquire an ensemble of trial-and-error reasoning experiences, which will serve as a new form of prompting to solve the complex problem. Starting from a simple prompt without requiring examples, BoT iteratively explores and evaluates a large collection of reasoning steps, and more importantly, uses error analysis obtained from the LLM on them to explicitly revise prompting, which in turn enhances reasoning step generation, until a final answer is attained. Our experiments with GPT-4 and Llama2 across extensive complex mathematical problems demonstrate that BoT consistently achieves higher or comparable problem-solving rates than other advanced prompting approaches.
- [1737] arXiv:2402.11161 (cross-list from cs.CL) [ pdf , ps , html , other ]
-
Title: PANDA (Pedantic ANswer-correctness Determination and Adjudication):Improving Automatic Evaluation for Question Answering and Text GenerationComments: 18 pages, 5 figures, 11 tables. arXiv admin note: substantial text overlap with arXiv:2401.13170Subjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI)
Abstract: Question answering (QA) can only make progress if we know if an answer is correct, but for many of the most challenging and interesting QA examples, current answer correctness (AC) metrics do not align with human judgments, particularly verbose, free form answers from large language models (LLM). There are two challenges: a lack of data and that models are too big. LLM based scorers correlate better with humans, but this expensive task has only been tested on limited QA datasets. We rectify these issues by providing clear guidelines for evaluating machine QA adopted from human QA contests. We also introduce Precise ANswer correctness Determination and Adjudication (PANDA), a small, efficient, deterministic AC classifier (812 KB) that more accurately evaluates answer correctness.
- [1738] arXiv:2402.11167 (cross-list from cs.CL) [ pdf , ps , html , other ]
-
Title: Token-Ensemble Text Generation: On Attacking the Automatic AI-Generated Text DetectionComments: Submitted to ACL 2024Subjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI)
Abstract: The robustness of AI-content detection models against cultivated attacks (e.g., paraphrasing or word switching) remains a significant concern. This study proposes a novel token-ensemble generation strategy to challenge the robustness of current AI-content detection approaches. We explore the ensemble attack strategy by completing the prompt with the next token generated from random candidate LLMs. We find the token-ensemble approach significantly drops the performance of AI-content detection models (The code and test sets will be released). Our findings reveal that token-ensemble generation poses a vital challenge to current detection models and underlines the need for advancing detection technologies to counter sophisticated adversarial strategies.
- [1739] arXiv:2402.11168 (cross-list from cs.LG) [ pdf , ps , html , other ]
-
Title: Trust Regions for Explanations via Black-Box Probabilistic CertificationJournal-ref: ICML 2024Subjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Abstract: Given the black box nature of machine learning models, a plethora of explainability methods have been developed to decipher the factors behind individual decisions. In this paper, we introduce a novel problem of black box (probabilistic) explanation certification. We ask the question: Given a black box model with only query access, an explanation for an example and a quality metric (viz. fidelity, stability), can we find the largest hypercube (i.e., $\ell_{\infty}$ ball) centered at the example such that when the explanation is applied to all examples within the hypercube, (with high probability) a quality criterion is met (viz. fidelity greater than some value)? Being able to efficiently find such a \emph{trust region} has multiple benefits: i) insight into model behavior in a \emph{region}, with a \emph{guarantee}; ii) ascertained \emph{stability} of the explanation; iii) \emph{explanation reuse}, which can save time, energy and money by not having to find explanations for every example; and iv) a possible \emph{meta-metric} to compare explanation methods. Our contributions include formalizing this problem, proposing solutions, providing theoretical guarantees for these solutions that are computable, and experimentally showing their efficacy on synthetic and real data.
- [1740] arXiv:2402.11176 (cross-list from cs.CL) [ pdf , ps , html , other ]
-
Title: KnowTuning: Knowledge-aware Fine-tuning for Large Language ModelsYougang Lyu , Lingyong Yan , Shuaiqiang Wang , Haibo Shi , Dawei Yin , Pengjie Ren , Zhumin Chen , Maarten de Rijke , Zhaochun RenSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI)
Abstract: Despite their success at many natural language processing (NLP) tasks, large language models still struggle to effectively leverage knowledge for knowledge-intensive tasks, manifesting limitations such as generating incomplete, non-factual, or illogical answers. These limitations stem from inadequate knowledge awareness of LLMs during vanilla fine-tuning. To address these problems, we propose a knowledge-aware fine-tuning (KnowTuning) method to improve fine-grained and coarse-grained knowledge awareness of LLMs. We devise a fine-grained knowledge augmentation stage to train LLMs to identify difficult fine-grained knowledge in answers. We also propose a coarse-grained knowledge comparison stage to train LLMs to distinguish between reliable and unreliable knowledge, in three aspects: completeness, factuality, and logicality. Extensive experiments on both generic and medical question answering (QA) datasets confirm the effectiveness of KnowTuning, through automatic and human evaluations, across various sizes of LLMs. We further verify that KnowTuning generates more facts with less factual error rate under fine-grained facts evaluation.
- [1741] arXiv:2402.11187 (cross-list from cs.CL) [ pdf , ps , html , other ]
-
Title: LaCo: Large Language Model Pruning via Layer CollapseSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI)
Abstract: Large language models (LLMs) based on transformer are witnessing a notable trend of size expansion, which brings considerable costs to both model training and inference. However, existing methods such as model quantization, knowledge distillation, and model pruning are constrained by various issues, including hardware support limitations, the need for extensive training, and alterations to the internal structure of the model. In this paper, we propose a concise layer-wise pruning method called \textit{Layer Collapse (LaCo)}, in which rear model layers collapse into a prior layer, enabling a rapid reduction in model size while preserving the model structure. Comprehensive experiments show that our method maintains an average task performance of over 80\% at pruning ratios of 25-30\%, significantly outperforming existing state-of-the-art structured pruning methods. We also conduct post-training experiments to confirm that the proposed pruning method effectively inherits the parameters of the original model. Finally, we discuss our motivation from the perspective of layer-wise similarity and evaluate the performance of the pruned LLMs across various pruning ratios.
- [1742] arXiv:2402.11192 (cross-list from cs.CL) [ pdf , ps , html , other ]
-
Title: I Learn Better If You Speak My Language: Enhancing Large Language Model Fine-Tuning with Style-Aligned Response AdjustmentsSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI)
Abstract: Fine-tuning large language models (LLMs) with a small data set for particular tasks is a widely encountered yet complex challenge. The potential for overfitting on a limited number of examples can negatively impact the model's ability to generalize and retain its original skills. Our research explores the impact of the style of ground-truth responses during the fine-tuning process. We found that matching the ground-truth response style with the LLM's inherent style results in better learning outcomes. Building on this insight, we developed a method that minimally alters the LLM's pre-existing responses to correct errors, using these adjusted responses as training targets. This technique enables precise corrections in line with the model's native response style, safeguarding the model's core capabilities and thus avoid overfitting. Our findings show that this approach not only improves the LLM's task-specific accuracy but also crucially maintains its original competencies and effectiveness.
- [1743] arXiv:2402.11196 (cross-list from cs.LG) [ pdf , ps , html , other ]
-
Title: Maintaining Adversarial Robustness in Continuous LearningXiaolei Ru , Xiaowei Cao , Zijia Liu , Jack Murdoch Moore , Xin-Ya Zhang , Xia Zhu , Wenjia Wei , Gang YanSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Abstract: Adversarial robustness is essential for security and reliability of machine learning systems. However, the adversarial robustness gained by sophisticated defense algorithms is easily erased as the neural network evolves to learn new tasks. This vulnerability can be addressed by fostering a novel capability for neural networks, termed continual robust learning, which focuses on both the (classification) performance and adversarial robustness on previous tasks during continuous learning. To achieve continuous robust learning, we propose an approach called Double Gradient Projection that projects the gradients for weight updates orthogonally onto two crucial subspaces -- one for stabilizing the smoothed sample gradients and another for stabilizing the final outputs of the neural network. The experimental results on four benchmarks demonstrate that the proposed approach effectively maintains continuous robustness against strong adversarial attacks, outperforming the baselines formed by combining the existing defense strategies and continual learning methods.
- [1744] arXiv:2402.11203 (cross-list from cs.IR) [ pdf , ps , other ]
-
Title: Exploring ChatGPT for Next-generation Information Retrieval: Opportunities and ChallengesComments: Survey PaperJournal-ref: Web Intelligence, vol. 22, no. 1, pp. 31-44, 2024Subjects: Information Retrieval (cs.IR) ; Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG)
Abstract: The rapid advancement of artificial intelligence (AI) has highlighted ChatGPT as a pivotal technology in the field of information retrieval (IR). Distinguished from its predecessors, ChatGPT offers significant benefits that have attracted the attention of both the industry and academic communities. While some view ChatGPT as a groundbreaking innovation, others attribute its success to the effective integration of product development and market strategies. The emergence of ChatGPT, alongside GPT-4, marks a new phase in Generative AI, generating content that is distinct from training examples and exceeding the capabilities of the prior GPT-3 model by OpenAI. Unlike the traditional supervised learning approach in IR tasks, ChatGPT challenges existing paradigms, bringing forth new challenges and opportunities regarding text quality assurance, model bias, and efficiency. This paper seeks to examine the impact of ChatGPT on IR tasks and offer insights into its potential future developments.
- [1745] arXiv:2402.11208 (cross-list from cs.CR) [ pdf , ps , other ]
-
Title: Watch Out for Your Agents! Investigating Backdoor Threats to LLM-Based AgentsComments: The first two authors contribute equally. Code and data are available at this https URLSubjects: Cryptography and Security (cs.CR) ; Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
Abstract: Leveraging the rapid development of Large Language Models LLMs, LLM-based agents have been developed to handle various real-world applications, including finance, healthcare, and shopping, etc. It is crucial to ensure the reliability and security of LLM-based agents during applications. However, the safety issues of LLM-based agents are currently under-explored. In this work, we take the first step to investigate one of the typical safety threats, backdoor attack, to LLM-based agents. We first formulate a general framework of agent backdoor attacks, then we present a thorough analysis on the different forms of agent backdoor attacks. Specifically, from the perspective of the final attacking outcomes, the attacker can either choose to manipulate the final output distribution, or only introduce malicious behavior in the intermediate reasoning process, while keeping the final output correct. Furthermore, the former category can be divided into two subcategories based on trigger locations: the backdoor trigger can be hidden either in the user query or in an intermediate observation returned by the external environment. We propose the corresponding data poisoning mechanisms to implement the above variations of agent backdoor attacks on two typical agent tasks, web shopping and tool utilization. Extensive experiments show that LLM-based agents suffer severely from backdoor attacks, indicating an urgent need for further research on the development of defenses against backdoor attacks on LLM-based agents. Warning: This paper may contain biased content.
- [1746] arXiv:2402.11241 (cross-list from cs.CV) [ pdf , ps , html , other ]
-
Title: DiffPoint: Single and Multi-view Point Cloud Reconstruction with ViT Based Diffusion ModelSubjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI)
Abstract: As the task of 2D-to-3D reconstruction has gained significant attention in various real-world scenarios, it becomes crucial to be able to generate high-quality point clouds. Despite the recent success of deep learning models in generating point clouds, there are still challenges in producing high-fidelity results due to the disparities between images and point clouds. While vision transformers (ViT) and diffusion models have shown promise in various vision tasks, their benefits for reconstructing point clouds from images have not been demonstrated yet. In this paper, we first propose a neat and powerful architecture called DiffPoint that combines ViT and diffusion models for the task of point cloud reconstruction. At each diffusion step, we divide the noisy point clouds into irregular patches. Then, using a standard ViT backbone that treats all inputs as tokens (including time information, image embeddings, and noisy patches), we train our model to predict target points based on input images. We evaluate DiffPoint on both single-view and multi-view reconstruction tasks and achieve state-of-the-art results. Additionally, we introduce a unified and flexible feature fusion module for aggregating image features from single or multiple input images. Furthermore, our work demonstrates the feasibility of applying unified architectures across languages and images to improve 3D reconstruction tasks.
- [1747] arXiv:2402.11242 (cross-list from cs.LG) [ pdf , ps , html , other ]
-
Title: Learning with Imbalanced Noisy Data by Preventing Bias in Sample SelectionComments: accepted by IEEE Transactions on MultimediaSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Abstract: Learning with noisy labels has gained increasing attention because the inevitable imperfect labels in real-world scenarios can substantially hurt the deep model performance. Recent studies tend to regard low-loss samples as clean ones and discard high-loss ones to alleviate the negative impact of noisy labels. However, real-world datasets contain not only noisy labels but also class imbalance. The imbalance issue is prone to causing failure in the loss-based sample selection since the under-learning of tail classes also leans to produce high losses. To this end, we propose a simple yet effective method to address noisy labels in imbalanced datasets. Specifically, we propose Class-Balance-based sample Selection (CBS) to prevent the tail class samples from being neglected during training. We propose Confidence-based Sample Augmentation (CSA) for the chosen clean samples to enhance their reliability in the training process. To exploit selected noisy samples, we resort to prediction history to rectify labels of noisy samples. Moreover, we introduce the Average Confidence Margin (ACM) metric to measure the quality of corrected labels by leveraging the model's evolving training dynamics, thereby ensuring that low-quality corrected noisy samples are appropriately masked out. Lastly, consistency regularization is imposed on filtered label-corrected noisy samples to boost model performance. Comprehensive experimental results on synthetic and real-world datasets demonstrate the effectiveness and superiority of our proposed method, especially in imbalanced scenarios. Comprehensive experimental results on synthetic and real-world datasets demonstrate the effectiveness and superiority of our proposed method, especially in imbalanced scenarios.
- [1748] arXiv:2402.11243 (cross-list from cs.CL) [ pdf , ps , html , other ]
-
Title: Can Large Language Models perform Relation-based Argument Mining?Comments: 10 pages, 9 figures, submitted to ACL 2024Subjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI)
Abstract: Argument mining (AM) is the process of automatically extracting arguments, their components and/or relations amongst arguments and components from text. As the number of platforms supporting online debate increases, the need for AM becomes ever more urgent, especially in support of downstream tasks. Relation-based AM (RbAM) is a form of AM focusing on identifying agreement (support) and disagreement (attack) relations amongst arguments. RbAM is a challenging classification task, with existing methods failing to perform satisfactorily. In this paper, we show that general-purpose Large Language Models (LLMs), appropriately primed and prompted, can significantly outperform the best performing (RoBERTa-based) baseline. Specifically, we experiment with two open-source LLMs (Llama-2 and Mistral) with ten datasets.
- [1749] arXiv:2402.11253 (cross-list from cs.LG) [ pdf , ps , html , other ]
-
Title: Aligning Large Language Models by On-Policy Self-JudgmentComments: Preprint; Under reviewSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
Abstract: Existing approaches for aligning large language models with human preferences face a trade-off that requires a separate reward model (RM) for on-policy learning. In this paper, we present a novel alignment framework, \method{} that (1) does on-policy learning and 2) is parameter efficient, as it does not require an additional RM for evaluating the samples for on-policy learning. To this end, we propose Judge-augmented Supervised Fine-Tuning (JSFT) to train a single model to act as both a policy and a judge. Specifically, we view the pairwise judgment task, choosing the better response from a response pair, as a special case of the instruction-following task. The resulting model can judge preferences of on-the-fly responses from current policy initialized from itself. Experimental results show the efficacy of \method{}, outperforming baselines in preference benchmarks. We also show that the rejecting sampling by itself can improve performance further without an additional evaluator.
- [1750] arXiv:2402.11260 (cross-list from cs.CL) [ pdf , ps , other ]
-
Title: MoRAL: MoE Augmented LoRA for LLMs' Lifelong LearningSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI)
Abstract: Adapting large language models (LLMs) to new domains/tasks and enabling them to be efficient lifelong learners is a pivotal challenge. In this paper, we propose MoRAL, i.e., Mixture-of-Experts augmented Low-Rank Adaptation for Lifelong Learning. MoRAL combines the multi-tasking abilities of MoE with the fine-tuning abilities of LoRA for effective life-long learning of LLMs. In contrast to the conventional approaches that use factual triplets as inputs MoRAL relies on simple question-answer pairs, which is a more practical and effective strategy for robust and efficient learning. Owing to new data settings, we introduce a new evaluation benchmark namely: Life Long Learning of LLM (5L-bench) encompassing a newly curated dataset of question-answer pairs, and a set of evaluation metrics for rigorous evaluation of MoRAL in open-book and closed-book settings. Experimental evaluation shows (i) LLMs learn fast in open-book settings with up to 30.15% improvement in "RA" for Phi-2-2.7B compared to closed-book (for models fine-tuned with MoRAL); (ii) MoRAL shows higher performance improvement for models with a greater number of parameters; (iii) MoRAL is robust to catastrophic forgetting offering better knowledge retention compared to baselines.
- [1751] arXiv:2402.11273 (cross-list from cs.CV) [ pdf , ps , other ]
-
Title: Semi-supervised Medical Image Segmentation Method Based on Cross-pseudo Labeling Leveraging Strong and Weak Data Augmentation StrategiesYifei Chen , Chenyan Zhang , Yifan Ke , Yiyu Huang , Xuezhou Dai , Feiwei Qin , Yongquan Zhang , Xiaodong Zhang , Changmiao WangComments: 5 pages, 2 figures, accept ISBI2024Journal-ref: ISBI 2024Subjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI)
Abstract: Traditional supervised learning methods have historically encountered certain constraints in medical image segmentation due to the challenging collection process, high labeling cost, low signal-to-noise ratio, and complex features characterizing biomedical images. This paper proposes a semi-supervised model, DFCPS, which innovatively incorporates the Fixmatch concept. This significantly enhances the model's performance and generalizability through data augmentation processing, employing varied strategies for unlabeled data. Concurrently, the model design gives appropriate emphasis to the generation, filtration, and refinement processes of pseudo-labels. The novel concept of cross-pseudo-supervision is introduced, integrating consistency learning with self-training. This enables the model to fully leverage pseudo-labels from multiple perspectives, thereby enhancing training diversity. The DFCPS model is compared with both baseline and advanced models using the publicly accessible Kvasir-SEG dataset. Across all four subdivisions containing different proportions of unlabeled data, our model consistently exhibits superior performance. Our source code is available at this https URL .
- [1752] arXiv:2402.11279 (cross-list from cs.CL) [ pdf , ps , other ]
-
Title: Multi-Perspective Consistency Enhances Confidence Estimation in Large Language ModelsSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI)
Abstract: In the deployment of large language models (LLMs), accurate confidence estimation is critical for assessing the credibility of model predictions. However, existing methods often fail to overcome the issue of overconfidence on incorrect answers. In this work, we focus on improving the confidence estimation of large language models. Considering the fragility of self-awareness in language models, we introduce a Multi-Perspective Consistency (MPC) method. We leverage complementary insights from different perspectives within models (MPC-Internal) and across different models (MPC-Across) to mitigate the issue of overconfidence arising from a singular viewpoint. The experimental results on eight publicly available datasets show that our MPC achieves state-of-the-art performance. Further analyses indicate that MPC can mitigate the problem of overconfidence and is effectively scalable to other models.
- [1753] arXiv:2402.11285 (cross-list from cs.NI) [ pdf , ps , other ]
-
Title: Fair Resource Allocation in Virtualized O-RAN PlatformsComments: to appear in ACM Sigmetrics 2024Subjects: Networking and Internet Architecture (cs.NI) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: O-RAN systems and their deployment in virtualized general-purpose computing platforms (O-Cloud) constitute a paradigm shift expected to bring unprecedented performance gains. However, these architectures raise new implementation challenges and threaten to worsen the already-high energy consumption of mobile networks. This paper presents first a series of experiments which assess the O-Cloud's energy costs and their dependency on the servers' hardware, capacity and data traffic properties which, typically, change over time. Next, it proposes a compute policy for assigning the base station data loads to O-Cloud servers in an energy-efficient fashion; and a radio policy that determines at near-real-time the minimum transmission block size for each user so as to avoid unnecessary energy costs. The policies balance energy savings with performance, and ensure that both of them are dispersed fairly across the servers and users, respectively. To cater for the unknown and time-varying parameters affecting the policies, we develop a novel online learning framework with fairness guarantees that apply to the entire operation horizon of the system (long-term fairness). The policies are evaluated using trace-driven simulations and are fully implemented in an O-RAN compatible system where we measure the energy costs and throughput in realistic scenarios.
- [1754] arXiv:2402.11291 (cross-list from cs.CL) [ pdf , ps , html , other ]
-
Title: Puzzle Solving using Reasoning of Large Language Models: A SurveySubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI)
Abstract: Exploring the capabilities of Large Language Models (LLMs) in puzzle solving unveils critical insights into their potential and challenges in AI, marking a significant step towards understanding their applicability in complex reasoning tasks. This survey leverages a unique taxonomy -- dividing puzzles into rule-based and rule-less categories -- to critically assess LLMs through various methodologies, including prompting techniques, neuro-symbolic approaches, and fine-tuning. Through a critical review of relevant datasets and benchmarks, we assess LLMs' performance, identifying significant challenges in complex puzzle scenarios. Our findings highlight the disparity between LLM capabilities and human-like reasoning, particularly in those requiring advanced logical inference. The survey underscores the necessity for novel strategies and richer datasets to advance LLMs' puzzle-solving proficiency and contribute to AI's logical reasoning and creative problem-solving advancements.
- [1755] arXiv:2402.11296 (cross-list from cs.CL) [ pdf , ps , other ]
-
Title: Dissecting Human and LLM PreferencesSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI)
Abstract: As a relative quality comparison of model responses, human and Large Language Model (LLM) preferences serve as common alignment goals in model fine-tuning and criteria in evaluation. Yet, these preferences merely reflect broad tendencies, resulting in less explainable and controllable models with potential safety risks. In this work, we dissect the preferences of human and 32 different LLMs to understand their quantitative composition, using annotations from real-world user-model conversations for a fine-grained, scenario-wise analysis. We find that humans are less sensitive to errors, favor responses that support their stances, and show clear dislike when models admit their limits. On the contrary, advanced LLMs like GPT-4-Turbo emphasize correctness, clarity, and harmlessness more. Additionally, LLMs of similar sizes tend to exhibit similar preferences, regardless of their training methods, and fine-tuning for alignment does not significantly alter the preferences of pretrained-only LLMs. Finally, we show that preference-based evaluation can be intentionally manipulated. In both training-free and training-based settings, aligning a model with the preferences of judges boosts scores, while injecting the least preferred properties lowers them. This results in notable score shifts: up to 0.59 on MT-Bench (1-10 scale) and 31.94 on AlpacaEval 2.0 (0-100 scale), highlighting the significant impact of this strategic adaptation. Interactive Demo: this https URL Dataset: this https URL Code: this https URL
- [1756] arXiv:2402.11314 (cross-list from cs.MA) [ pdf , ps , other ]
-
Title: Multi-Generative Agent Collective Decision-Making in Urban Planning: A Case Study for Kendall Square RenovationSubjects: Multiagent Systems (cs.MA) ; Artificial Intelligence (cs.AI)
Abstract: In this study, we develop a multiple-generative agent system to simulate community decision-making for the redevelopment of Kendall Square's Volpe building. Drawing on interviews with local stakeholders, our simulations incorporated varying degrees of communication, demographic data, and life values in the agent prompts. The results revealed that communication among agents improved collective reasoning, while the inclusion of demographic and life values led to more distinct opinions. These findings highlight the potential application of AI in understanding complex social interactions and decision-making processes, offering valuable insights for urban planning and community engagement in diverse settings like Kendall Square.
- [1757] arXiv:2402.11317 (cross-list from cs.LG) [ pdf , ps , other ]
-
Title: Debiased Offline Representation Learning for Fast Online Adaptation in Non-stationary DynamicsSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Abstract: Developing policies that can adjust to non-stationary environments is essential for real-world reinforcement learning applications. However, learning such adaptable policies in offline settings, with only a limited set of pre-collected trajectories, presents significant challenges. A key difficulty arises because the limited offline data makes it hard for the context encoder to differentiate between changes in the environment dynamics and shifts in the behavior policy, often leading to context misassociations. To address this issue, we introduce a novel approach called Debiased Offline Representation for fast online Adaptation (DORA). DORA incorporates an information bottleneck principle that maximizes mutual information between the dynamics encoding and the environmental data, while minimizing mutual information between the dynamics encoding and the actions of the behavior policy. We present a practical implementation of DORA, leveraging tractable bounds of the information bottleneck principle. Our experimental evaluation across six benchmark MuJoCo tasks with variable parameters demonstrates that DORA not only achieves a more precise dynamics encoding but also significantly outperforms existing baselines in terms of performance.
- [1758] arXiv:2402.11319 (cross-list from cs.RO) [ pdf , ps , other ]
-
Title: Hysteresis Compensation of Flexible Continuum Manipulator using RGBD Sensing and Temporal Convolutional NetworkComments: 8 pages, 11 figures, 5 tablesJournal-ref: IEEE Robotics and Automation Letters (2024)Subjects: Robotics (cs.RO) ; Artificial Intelligence (cs.AI)
Abstract: Flexible continuum manipulators are valued for minimally invasive surgery, offering access to confined spaces through nonlinear paths. However, cable-driven manipulators face control difficulties due to hysteresis from cabling effects such as friction, elongation, and coupling. These effects are difficult to model due to nonlinearity and the difficulties become even more evident when dealing with long and coupled, multi-segmented manipulator. This paper proposes a data-driven approach based on Deep Neural Networks (DNN) to capture these nonlinear and previous states-dependent characteristics of cable actuation. We collect physical joint configurations according to command joint configurations using RGBD sensing and 7 fiducial markers to model the hysteresis of the proposed manipulator. Result on a study comparing the estimation performance of four DNN models show that the Temporal Convolution Network (TCN) demonstrates the highest predictive capability. Leveraging trained TCNs, we build a control algorithm to compensate for hysteresis. Tracking tests in task space using unseen trajectories show that the proposed control algorithm reduces the average position and orientation error by 61.39% (from 13.7mm to 5.29 mm) and 64.04% (from 31.17° to 11.21°), respectively. This result implies that the proposed calibrated controller effectively reaches the desired configurations by estimating the hysteresis of the manipulator. Applying this method in real surgical scenarios has the potential to enhance control precision and improve surgical performance.
- [1759] arXiv:2402.11322 (cross-list from cs.NE) [ pdf , ps , html , other ]
-
Title: SpikeNAS: A Fast Memory-Aware Neural Architecture Search Framework for Spiking Neural Network-based Autonomous AgentsComments: 8 pages, 13 figures, 2 tablesSubjects: Neural and Evolutionary Computing (cs.NE) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: Autonomous mobile agents (e.g., UAVs and UGVs) are typically expected to incur low power/energy consumption for solving machine learning tasks (such as object recognition), as these mobile agents are usually powered by portable batteries. These requirements can be fulfilled by Spiking Neural Networks (SNNs), since their bio-inspired spike-based operations offer high accuracy and ultra low-power/energy computation. Currently, most of the SNN architectures are derived from Artificial Neural Networks whose neurons' architectures and operations are different from SNNs, or developed without considering memory budgets from the underlying processing hardware of autonomous mobile agents. These limitations hinder SNNs from reaching their full potential in accuracy and efficiency. Toward this, we propose SpikeNAS, a novel fast memory-aware neural architecture search (NAS) framework for SNNs that quickly finds an appropriate SNN architecture with high accuracy under the given memory budgets from autonomous mobile agents. To do this, our SpikeNAS employs several key steps: analyzing the impacts of network operations on the accuracy, enhancing the network architecture to improve the learning quality, and developing a fast memory-aware search algorithm. The experimental results show that our SpikeNAS improves the searching time and maintains high accuracy as compared to state-of-the-art while meeting the given memory budgets (e.g., 4.4x faster search with 1.3% accuracy improvement for CIFAR100, using an Nvidia RTX 6000 Ada GPU machine), thereby quickly providing the appropriate SNN architecture for the memory-constrained autonomous mobile agents.
- [1760] arXiv:2402.11337 (cross-list from cs.CV) [ pdf , ps , other ]
-
Title: Learning by Reconstruction Produces Uninformative Features For PerceptionSubjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
Abstract: Input space reconstruction is an attractive representation learning paradigm. Despite interpretability of the reconstruction and generation, we identify a misalignment between learning by reconstruction, and learning for perception. We show that the former allocates a model's capacity towards a subspace of the data explaining the observed variance--a subspace with uninformative features for the latter. For example, the supervised TinyImagenet task with images projected onto the top subspace explaining 90\% of the pixel variance can be solved with 45\% test accuracy. Using the bottom subspace instead, accounting for only 20\% of the pixel variance, reaches 55\% test accuracy. The features for perception being learned last explains the need for long training time, e.g., with Masked Autoencoders. Learning by denoising is a popular strategy to alleviate that misalignment. We prove that while some noise strategies such as masking are indeed beneficial, others such as additive Gaussian noise are not. Yet, even in the case of masking, we find that the benefits vary as a function of the mask's shape, ratio, and the considered dataset. While tuning the noise strategy without knowledge of the perception task seems challenging, we provide first clues on how to detect if a noise strategy is never beneficial regardless of the perception task.
- [1761] arXiv:2402.11338 (cross-list from cs.LG) [ pdf , ps , other ]
-
Title: Fair Classification with Partial Feedback: An Exploration-Based Data-Collection ApproachSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Computers and Society (cs.CY); Machine Learning (stat.ML)
Abstract: In many predictive contexts (e.g., credit lending), true outcomes are only observed for samples that were positively classified in the past. These past observations, in turn, form training datasets for classifiers that make future predictions. However, such training datasets lack information about the outcomes of samples that were (incorrectly) negatively classified in the past and can lead to erroneous classifiers. We present an approach that trains a classifier using available data and comes with a family of exploration strategies to collect outcome data about subpopulations that otherwise would have been ignored. For any exploration strategy, the approach comes with guarantees that (1) all sub-populations are explored, (2) the fraction of false positives is bounded, and (3) the trained classifier converges to a "desired" classifier. The right exploration strategy is context-dependent; it can be chosen to improve learning guarantees and encode context-specific group fairness properties. Evaluation on real-world datasets shows that this approach consistently boosts the quality of collected outcome data and improves the fraction of true positives for all groups, with only a small reduction in predictive utility.
- [1762] arXiv:2402.11349 (cross-list from cs.CL) [ pdf , ps , other ]
-
Title: Tasks That Language Models Don't LearnSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI)
Abstract: We argue that there are certain properties of language that our current large language models (LLMs) don't learn. We present an empirical investigation of visual-auditory properties of language through a series of tasks, termed H-TEST. This benchmark highlights a fundamental gap between human linguistic comprehension, which naturally integrates sensory experiences, and the sensory-deprived processing capabilities of LLMs. In support of our hypothesis, 1. deliberate reasoning (Chain-of-Thought), 2. few-shot examples, or 3. stronger LLM from the same model family (LLaMA 2 13B -> LLaMA 2 70B) do not trivially bring improvements in H-TEST performance. Therefore, we make a particular connection to the philosophical case of Mary, who learns about the world in a sensory-deprived environment (Jackson, 1986). Our experiments show that some of the strongest proprietary LLMs stay near random chance baseline accuracy of 50%, highlighting the limitations of knowledge acquired in the absence of sensory experience.
- [1763] arXiv:2402.11353 (cross-list from cs.HC) [ pdf , ps , html , other ]
-
Title: Understanding the Impact of Long-Term Memory on Self-Disclosure with Large Language Model-Driven Chatbots for Public Health InterventionComments: Accepted to ACM CHI 2024 as a full paperJournal-ref: In Proceedings of the CHI Conference on Human Factors in Computing Systems (CHI '24), May 11-16, 2024, Honolulu, HI, USA. ACM, New York, NY, USASubjects: Human-Computer Interaction (cs.HC) ; Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
Abstract: Recent large language models (LLMs) offer the potential to support public health monitoring by facilitating health disclosure through open-ended conversations but rarely preserve the knowledge gained about individuals across repeated interactions. Augmenting LLMs with long-term memory (LTM) presents an opportunity to improve engagement and self-disclosure, but we lack an understanding of how LTM impacts people's interaction with LLM-driven chatbots in public health interventions. We examine the case of CareCall -- an LLM-driven voice chatbot with LTM -- through the analysis of 1,252 call logs and interviews with nine users. We found that LTM enhanced health disclosure and fostered positive perceptions of the chatbot by offering familiarity. However, we also observed challenges in promoting self-disclosure through LTM, particularly around addressing chronic health conditions and privacy concerns. We discuss considerations for LTM integration in LLM-driven chatbots for public health monitoring, including carefully deciding what topics need to be remembered in light of public health goals.
- [1764] arXiv:2402.11354 (cross-list from cs.LG) [ pdf , ps , other ]
-
Title: Probabilistic Routing for Graph-Based Approximate Nearest Neighbor SearchComments: Source code will be released at GitHub soonSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Databases (cs.DB); Data Structures and Algorithms (cs.DS)
Abstract: Approximate nearest neighbor search (ANNS) in high-dimensional spaces is a pivotal challenge in the field of machine learning. In recent years, graph-based methods have emerged as the superior approach to ANNS, establishing a new state of the art. Although various optimizations for graph-based ANNS have been introduced, they predominantly rely on heuristic methods that lack formal theoretical backing. This paper aims to enhance routing within graph-based ANNS by introducing a method that offers a probabilistic guarantee when exploring a node's neighbors in the graph. We formulate the problem as probabilistic routing and develop two baseline strategies by incorporating locality-sensitive techniques. Subsequently, we introduce PEOs, a novel approach that efficiently identifies which neighbors in the graph should be considered for exact distance computation, thus significantly improving efficiency in practice. Our experiments demonstrate that equipping PEOs can increase throughput on a commonly utilized graph index (HNSW) by a factor of 1.6 to 2.5, and its efficiency consistently outperforms the leading-edge routing technique by 1.1 to 1.4 times.
- [1765] arXiv:2402.11363 (cross-list from q-bio.QM) [ pdf , ps , html , other ]
-
Title: Transformer-based de novo peptide sequencing for data-independent acquisition mass spectrometryComments: Ebrahimi S., Guo X. Transformer-based de novo peptide sequencing for data-independent acquisition mass spectrometry. In 2023 IEEE 23rd International Conference on Bioinformatics and Bioengineering (BIBE) 2022 Dec 6 (pp. 17-22). IEEESubjects: Quantitative Methods (q-bio.QM) ; Artificial Intelligence (cs.AI)
Abstract: Tandem mass spectrometry (MS/MS) stands as the predominant high-throughput technique for comprehensively analyzing protein content within biological samples. This methodology is a cornerstone driving the advancement of proteomics. In recent years, substantial strides have been made in Data-Independent Acquisition (DIA) strategies, facilitating impartial and non-targeted fragmentation of precursor ions. The DIA-generated MS/MS spectra present a formidable obstacle due to their inherent high multiplexing nature. Each spectrum encapsulates fragmented product ions originating from multiple precursor peptides. This intricacy poses a particularly acute challenge in de novo peptide/protein sequencing, where current methods are ill-equipped to address the multiplexing conundrum. In this paper, we introduce DiaTrans, a deep-learning model based on transformer architecture. It deciphers peptide sequences from DIA mass spectrometry data. Our results show significant improvements over existing STOA methods, including DeepNovo-DIA and PepNet. Casanovo-DIA enhances precision by 15.14% to 34.8%, recall by 11.62% to 31.94% at the amino acid level, and boosts precision by 59% to 81.36% at the peptide level. Integrating DIA data and our DiaTrans model holds considerable promise to uncover novel peptides and more comprehensive profiling of biological samples. Casanovo-DIA is freely available under the GNU GPL license at this https URL .
- [1766] arXiv:2402.11398 (cross-list from cs.CL) [ pdf , ps , html , other ]
-
Title: Reasoning before Comparison: LLM-Enhanced Semantic Similarity Metrics for Domain Specialized Text AnalysisShaochen Xu , Zihao Wu , Huaqin Zhao , Peng Shu , Zhengliang Liu , Wenxiong Liao , Sheng Li , Andrea Sikora , Tianming Liu , Xiang LiSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI)
Abstract: In this study, we leverage LLM to enhance the semantic analysis and develop similarity metrics for texts, addressing the limitations of traditional unsupervised NLP metrics like ROUGE and BLEU. We develop a framework where LLMs such as GPT-4 are employed for zero-shot text identification and label generation for radiology reports, where the labels are then used as measurements for text similarity. By testing the proposed framework on the MIMIC data, we find that GPT-4 generated labels can significantly improve the semantic similarity assessment, with scores more closely aligned with clinical ground truth than traditional NLP metrics. Our work demonstrates the possibility of conducting semantic analysis of the text data using semi-quantitative reasoning results by the LLMs for highly specialized domains. While the framework is implemented for radiology report similarity analysis, its concept can be extended to other specialized domains as well.
- [1767] arXiv:2402.11417 (cross-list from cs.CL) [ pdf , ps , other ]
-
Title: LoRETTA: Low-Rank Economic Tensor-Train Adaptation for Ultra-Low-Parameter Fine-Tuning of Large Language ModelsSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: Various parameter-efficient fine-tuning (PEFT) techniques have been proposed to enable computationally efficient fine-tuning while maintaining model performance. However, existing PEFT methods are still limited by the growing number of trainable parameters with the rapid deployment of Large Language Models (LLMs). To address this challenge, we present LoRETTA, an ultra-parameter-efficient framework that significantly reduces trainable parameters through tensor-train decomposition. Specifically, we propose two methods, named {LoRETTA}$_{adp}$ and {LoRETTA}$_{rep}$. The former employs tensorized adapters, offering a high-performance yet lightweight approach for the fine-tuning of LLMs. The latter emphasizes fine-tuning via weight parameterization with a set of small tensor factors. LoRETTA achieves comparable or better performance than most widely used PEFT methods with up to $100\times$ fewer parameters on the LLaMA-2-7B models. Furthermore, empirical results demonstrate that the proposed method effectively improves training efficiency, enjoys better multi-task learning performance, and enhances the anti-overfitting capability. Plug-and-play codes built upon the Huggingface framework and PEFT library will be released.
- [1768] arXiv:2402.11424 (cross-list from cs.CV) [ pdf , ps , html , other ]
-
Title: Data Distribution Distilled Generative Model for Generalized Zero-Shot RecognitionComments: accepted as AAAI 2024 oral paperSubjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI)
Abstract: In the realm of Zero-Shot Learning (ZSL), we address biases in Generalized Zero-Shot Learning (GZSL) models, which favor seen data. To counter this, we introduce an end-to-end generative GZSL framework called D$^3$GZSL. This framework respects seen and synthesized unseen data as in-distribution and out-of-distribution data, respectively, for a more balanced model. D$^3$GZSL comprises two core modules: in-distribution dual space distillation (ID$^2$SD) and out-of-distribution batch distillation (O$^2$DBD). ID$^2$SD aligns teacher-student outcomes in embedding and label spaces, enhancing learning coherence. O$^2$DBD introduces low-dimensional out-of-distribution representations per batch sample, capturing shared structures between seen and unseen categories. Our approach demonstrates its effectiveness across established GZSL benchmarks, seamlessly integrating into mainstream generative frameworks. Extensive experiments consistently showcase that D$^3$GZSL elevates the performance of existing generative GZSL methods, underscoring its potential to refine zero-shot learning practices.The code is available at: this https URL
- [1769] arXiv:2402.11427 (cross-list from cs.LG) [ pdf , ps , other ]
-
Title: OptEx: Expediting First-Order Optimization with Approximately Parallelized IterationsSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
Abstract: First-order optimization (FOO) algorithms are pivotal in numerous computational domains such as machine learning and signal denoising. However, their application to complex tasks like neural network training often entails significant inefficiencies due to the need for many sequential iterations for convergence. In response, we introduce first-order optimization expedited with approximately parallelized iterations (OptEx), the first framework that enhances the efficiency of FOO by leveraging parallel computing to mitigate its iterative bottleneck. OptEx employs kernelized gradient estimation to make use of gradient history for future gradient prediction, enabling parallelization of iterations -- a strategy once considered impractical because of the inherent iterative dependency in FOO. We provide theoretical guarantees for the reliability of our kernelized gradient estimation and the iteration complexity of SGD-based OptEx, confirming that estimation errors diminish to zero as historical gradients accumulate and that SGD-based OptEx enjoys an effective acceleration rate of $\Omega(\sqrt{N})$ over standard SGD given parallelism of N. We also use extensive empirical studies, including synthetic functions, reinforcement learning tasks, and neural network training across various datasets, to underscore the substantial efficiency improvements achieved by OptEx.
- [1770] arXiv:2402.11436 (cross-list from cs.CL) [ pdf , ps , other ]
-
Title: Perils of Self-Feedback: Self-Bias Amplifies in Large Language ModelsSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI)
Abstract: Recent studies show that self-feedback improves large language models (LLMs) on certain tasks while worsens other tasks. We discovered that such a contrary is due to LLM's bias towards their own output. In this paper, we formally define LLM's self-bias -- the tendency to favor its own generation -- using two statistics. We analyze six LLMs on translation, constrained text generation, and mathematical reasoning tasks. We find that self-bias is prevalent in all examined LLMs across multiple languages and tasks. Our analysis reveals that while the self-refine pipeline improves the fluency and understandability of model outputs, it further amplifies self-bias. To mitigate such biases, we discover that larger model size and external feedback with accurate assessment can significantly reduce bias in the self-refine pipeline, leading to actual performance improvement in downstream tasks.
- [1771] arXiv:2402.11441 (cross-list from cs.CL) [ pdf , ps , other ]
-
Title: InfuserKI: Enhancing Large Language Models with Knowledge Graphs via Infuser-Guided Knowledge IntegrationComments: 12 pages, 5 figuresSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: Though Large Language Models (LLMs) have shown remarkable open-generation capabilities across diverse domains, they struggle with knowledge-intensive tasks. To alleviate this issue, knowledge integration methods have been proposed to enhance LLMs with domain-specific knowledge graphs using external modules. However, they suffer from data inefficiency as they require both known and unknown knowledge for fine-tuning. Thus, we study a novel problem of integrating unknown knowledge into LLMs efficiently without unnecessary overlap of known knowledge. Injecting new knowledge poses the risk of forgetting previously acquired knowledge. To tackle this, we propose a novel Infuser-Guided Knowledge Integration (InfuserKI) framework that utilizes transformer internal states to determine whether to enhance the original LLM output with additional information, thereby effectively mitigating knowledge forgetting. Evaluations on the UMLS-2.5k and MetaQA domain knowledge graphs demonstrate that InfuserKI can effectively acquire new knowledge and outperform state-of-the-art baselines by 9% and 6%, respectively, in reducing knowledge forgetting.
- [1772] arXiv:2402.11444 (cross-list from cs.CY) [ pdf , ps , other ]
-
Title: Gauging Public Acceptance of Conditionally Automated Vehicles in the United StatesAntonios Saravanos (1), Eleftheria K. Pissadaki (1), Wayne S. Singh (1), Donatella Delfino (1) ((1) New York University)Subjects: Computers and Society (cs.CY) ; Artificial Intelligence (cs.AI)
Abstract: Public acceptance of conditionally automated vehicles is a crucial step in the realization of smart cities. Prior research in Europe has shown that the factors of hedonic motivation, social influence, and performance expectancy, in decreasing order of importance, influence acceptance. Moreover, a generally positive acceptance of the technology was reported. However, there is a lack of information regarding the public acceptance of conditionally automated vehicles in the United States. In this study, we carried out a web-based experiment where participants were provided information regarding the technology and then completed a questionnaire on their perceptions. The collected data was analyzed using PLS-SEM to examine the factors that may lead to public acceptance of the technology in the United States. Our findings showed that social influence, performance expectancy, effort expectancy, hedonic motivation, and facilitating conditions determine conditionally automated vehicle acceptance. Additionally, certain factors were found to influence the perception of how useful the technology is, the effort required to use it, and the facilitating conditions for its use. By integrating the insights gained from this study, stakeholders can better facilitate the adoption of autonomous vehicle technology, contributing to safer, more efficient, and user-friendly transportation systems in the future that help realize the vision of the smart city.
- [1773] arXiv:2402.11451 (cross-list from cs.CL) [ pdf , ps , html , other ]
-
Title: SciAgent: Tool-augmented Language Models for Scientific ReasoningYubo Ma , Zhibin Gou , Junheng Hao , Ruochen Xu , Shuohang Wang , Liangming Pan , Yujiu Yang , Yixin Cao , Aixin Sun , Hany Awadalla , Weizhu ChenSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI)
Abstract: Scientific reasoning poses an excessive challenge for even the most advanced Large Language Models (LLMs). To make this task more practical and solvable for LLMs, we introduce a new task setting named tool-augmented scientific reasoning. This setting supplements LLMs with scalable toolsets, and shifts the focus from pursuing an omniscient problem solver to a proficient tool-user. To facilitate the research of such setting, we construct a tool-augmented training corpus named MathFunc which encompasses over 30,000 samples and roughly 6,000 tools. Building on MathFunc, we develop SciAgent to retrieve, understand and, if necessary, use tools for scientific problem solving. Additionally, we craft a benchmark, SciToolBench, spanning five scientific domains to evaluate LLMs' abilities with tool assistance. Extensive experiments on SciToolBench confirm the effectiveness of SciAgent. Notably, SciAgent-Mistral-7B surpasses other LLMs with the same size by more than 13% in absolute accuracy. Furthermore, SciAgent-DeepMath-7B shows much superior performance than ChatGPT.
- [1774] arXiv:2402.11459 (cross-list from q-bio.BM) [ pdf , ps , html , other ]
-
Title: Re-Dock: Towards Flexible and Realistic Molecular Docking with Diffusion BridgeYufei Huang , Odin Zhang , Lirong Wu , Cheng Tan , Haitao Lin , Zhangyang Gao , Siyuan Li , Stan.Z. LiSubjects: Biomolecules (q-bio.BM) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Chemical Physics (physics.chem-ph)
Abstract: Accurate prediction of protein-ligand binding structures, a task known as molecular docking is crucial for drug design but remains challenging. While deep learning has shown promise, existing methods often depend on holo-protein structures (docked, and not accessible in realistic tasks) or neglect pocket sidechain conformations, leading to limited practical utility and unrealistic conformation predictions. To fill these gaps, we introduce an under-explored task, named flexible docking to predict poses of ligand and pocket sidechains simultaneously and introduce Re-Dock, a novel diffusion bridge generative model extended to geometric manifolds. Specifically, we propose energy-to-geometry mapping inspired by the Newton-Euler equation to co-model the binding energy and conformations for reflecting the energy-constrained docking generative process. Comprehensive experiments on designed benchmark datasets including apo-dock and cross-dock demonstrate our model's superior effectiveness and efficiency over current methods.
- [1775] arXiv:2402.11463 (cross-list from cs.LG) [ pdf , ps , other ]
-
Title: Attractor Memory for Long-Term Time Series Forecasting: A Chaos PerspectiveComments: arXiv admin note: substantial text overlap with arXiv:2008.07669 , arXiv:nlin/0307015 by other authorsSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Chaotic Dynamics (nlin.CD)
Abstract: In long-term time series forecasting (LTSF) tasks, existing deep learning models overlook the crucial characteristic that discrete time series originate from underlying continuous dynamic systems, resulting in a lack of extrapolation and evolution capabilities. Recognizing the chaotic nature of real-world data, our model, \textbf{\textit{Attraos}}, incorporates chaos theory into LTSF, perceiving real-world time series as observations from unknown high-dimensional chaotic dynamic systems. Under the concept of attractor invariance, Attraos utilizes the proposed multi-scale dynamic memory unit to memorize historical dynamics structure and predicts by a frequency-enhanced local evolution strategy. Detailed theoretical analysis and abundant empirical evidence consistently show that Attraos outperforms various LTSF methods on mainstream LTSF datasets and chaotic datasets.
- [1776] arXiv:2402.11472 (cross-list from q-bio.BM) [ pdf , ps , other ]
-
Title: DDIPrompt: Drug-Drug Interaction Event Prediction based on Graph Prompt LearningSubjects: Biomolecules (q-bio.BM) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: Recently, Graph Neural Networks have become increasingly prevalent in predicting adverse drug-drug interactions (DDI) due to their proficiency in modeling the intricate associations between atoms and functional groups within and across drug molecules. However, they are still hindered by two significant challenges: (1) the issue of highly imbalanced event distribution, which is a common but critical problem in medical datasets where certain interactions are vastly underrepresented. This imbalance poses a substantial barrier to achieving accurate and reliable DDI predictions. (2) the scarcity of labeled data for rare events, which is a pervasive issue in the medical field where rare yet potentially critical interactions are often overlooked or under-studied due to limited available data. In response, we offer DDIPrompt, an innovative panacea inspired by the recent advancements in graph prompting. Our framework aims to address these issues by leveraging the intrinsic knowledge from pre-trained models, which can be efficiently deployed with minimal downstream data. Specifically, to solve the first challenge, DDIPrompt employs augmented links between drugs, considering both structural and interactive proximity. It features a hierarchical pre-training strategy that comprehends intra-molecular structures and inter-molecular interactions, fostering a comprehensive and unbiased understanding of drug properties. For the second challenge, we implement a prototype-enhanced prompting mechanism during inference. This mechanism, refined by few-shot examples from each category, effectively harnesses the rich pre-training knowledge to enhance prediction accuracy, particularly for these rare but crucial interactions. Comprehensive evaluations on two benchmark datasets demonstrate the superiority of DDIPrompt, particularly in predicting rare DDI events.
- [1777] arXiv:2402.11485 (cross-list from cs.CL) [ pdf , ps , html , other ]
-
Title: LEIA: Facilitating Cross-Lingual Knowledge Transfer in Language Models with Entity-based Data AugmentationSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: Adapting English-based large language models (LLMs) to other languages has become increasingly popular due to the efficiency and potential of cross-lingual transfer. However, existing language adaptation methods often overlook the benefits of cross-lingual supervision. In this study, we introduce LEIA, a language adaptation tuning method that utilizes Wikipedia entity names aligned across languages. This method involves augmenting the target language corpus with English entity names and training the model using left-to-right language modeling. We assess LEIA on diverse question answering datasets using 7B-parameter LLMs, demonstrating significant performance gains across various non-English languages. The source code is available at this https URL .
- [1778] arXiv:2402.11498 (cross-list from cs.RO) [ pdf , ps , html , other ]
-
Title: Verifiably Following Complex Robot Instructions with Foundation ModelsSubjects: Robotics (cs.RO) ; Artificial Intelligence (cs.AI)
Abstract: Enabling robots to follow complex natural language instructions is an important yet challenging problem. People want to flexibly express constraints, refer to arbitrary landmarks and verify behavior when instructing robots. Conversely, robots must disambiguate human instructions into specifications and ground instruction referents in the real world. We propose Language Instruction grounding for Motion Planning (LIMP), a system that leverages foundation models and temporal logics to generate instruction-conditioned semantic maps that enable robots to verifiably follow expressive and long-horizon instructions with open vocabulary referents and complex spatiotemporal constraints. In contrast to prior methods for using foundation models in robot task execution, LIMP constructs an explainable instruction representation that reveals the robot's alignment with an instructor's intended motives and affords the synthesis of robot behaviors that are correct-by-construction. We demonstrate LIMP in three real-world environments, across a set of 35 complex spatiotemporal instructions, showing the generality of our approach and the ease of deployment in novel unstructured domains. In our experiments, LIMP can spatially ground open-vocabulary referents and synthesize constraint-satisfying plans in 90% of object-goal navigation and 71% of mobile manipulation instructions. See supplementary videos at this https URL
- [1779] arXiv:2402.11505 (cross-list from cs.CL) [ pdf , ps , other ]
-
Title: Federated Fine-tuning of Large Language Models under Heterogeneous Language Tasks and Client ResourcesComments: 14 pages, 8 figures, 8 tablesSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI)
Abstract: Federated Learning (FL) has recently been applied to the parameter-efficient fine-tuning of Large Language Models (LLMs). While promising, it raises significant challenges due to the heterogeneous resources and data distributions of clients.This study introduces FlexLoRA, a simple yet effective aggregation scheme for LLM fine-tuning, which mitigates the "buckets effect" in traditional FL that restricts the potential of clients with ample resources by tying them to the capabilities of the least-resourced participants. FlexLoRA allows for dynamic adjustment of local LoRA ranks, fostering the development of a global model imbued with broader, less task-specific knowledge. By synthesizing a full-size LoRA weight from individual client contributions and employing Singular Value Decomposition (SVD) for weight redistribution, FlexLoRA fully leverages heterogeneous client resources. Involving over 1,600 clients performing diverse NLP tasks, our experiments validate the efficacy of FlexLoRA, with the federated global model achieving up to a 3.1% average improvement in downstream NLP task performance. FlexLoRA's practicality is further underscored by its seamless integration with existing LoRA-based FL methods and theoretical analysis, offering a path toward scalable, privacy-preserving federated tuning for LLMs.
- [1780] arXiv:2402.11523 (cross-list from cs.IR) [ pdf , ps , html , other ]
-
Title: Neighborhood-Enhanced Supervised Contrastive Learning for Collaborative FilteringJournal-ref: IEEE TKDE, 2023Subjects: Information Retrieval (cs.IR) ; Artificial Intelligence (cs.AI)
Abstract: While effective in recommendation tasks, collaborative filtering (CF) techniques face the challenge of data sparsity. Researchers have begun leveraging contrastive learning to introduce additional self-supervised signals to address this. However, this approach often unintentionally distances the target user/item from their collaborative neighbors, limiting its efficacy. In response, we propose a solution that treats the collaborative neighbors of the anchor node as positive samples within the final objective loss function. This paper focuses on developing two unique supervised contrastive loss functions that effectively combine supervision signals with contrastive loss. We analyze our proposed loss functions through the gradient lens, demonstrating that different positive samples simultaneously influence updating the anchor node's embeddings. These samples' impact depends on their similarities to the anchor node and the negative samples. Using the graph-based collaborative filtering model as our backbone and following the same data augmentation methods as the existing contrastive learning model SGL, we effectively enhance the performance of the recommendation model. Our proposed Neighborhood-Enhanced Supervised Contrastive Loss (NESCL) model substitutes the contrastive loss function in SGL with our novel loss function, showing marked performance improvement. On three real-world datasets, Yelp2018, Gowalla, and Amazon-Book, our model surpasses the original SGL by 10.09%, 7.09%, and 35.36% on NDCG@20, respectively.
- [1781] arXiv:2402.11534 (cross-list from cs.CL) [ pdf , ps , other ]
-
Title: PreAct: Predicting Future in ReAct Enhances Agent's Planning AbilityComments: 13 pages, 6 giguresSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI)
Abstract: Addressing the discrepancies between predictions and actual outcomes often aids individuals in expanding their thought processes and engaging in reflection, thereby facilitating reasoning in the correct direction. In this paper, we introduce $\textbf{PreAct}$, an agent framework that integrates $\textbf{pre}$diction with $\textbf{rea}$soning and $\textbf{act}$ion. Leveraging the information provided by predictions, a large language model (LLM) based agent can offer more diversified and strategically oriented reasoning, which in turn leads to more effective actions that help the agent complete complex tasks. Our experiments demonstrate that PreAct outperforms the ReAct approach in accomplishing complex tasks and that PreAct can be co-enhanced when combined with Reflexion methods. We prompt the model with different numbers of historical predictions and find that historical predictions have a sustained positive effect on LLM planning. The differences in single-step reasoning between PreAct and ReAct show that PreAct indeed offers advantages in terms of diversity and strategic directivity over ReAct.
- [1782] arXiv:2402.11537 (cross-list from cs.CL) [ pdf , ps , html , other ]
-
Title: Deciphering the Impact of Pretraining Data on Large Language Models through Machine UnlearningSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI)
Abstract: Through pretraining on a corpus with various sources, Large Language Models (LLMs) have gained impressive performance. However, the impact of each component of the pretraining corpus remains opaque. As a result, the organization of the pretraining corpus is still empirical and may deviate from the optimal. To address this issue, we systematically analyze the impact of 48 datasets from 5 major categories of pretraining data of LLMs and measure their impacts on LLMs using benchmarks about nine major categories of model capabilities. Our analyses provide empirical results about the contribution of multiple corpora on the performances of LLMs, along with their joint impact patterns, including complementary, orthogonal, and correlational relationships. We also identify a set of ``high-impact data'' such as Books that is significantly related to a set of model capabilities. These findings provide insights into the organization of data to support more efficient pretraining of LLMs.
- [1783] arXiv:2402.11541 (cross-list from cs.CL) [ pdf , ps , html , other ]
-
Title: Counter-intuitive: Large Language Models Can Better Understand Knowledge Graphs Than We ThoughtComments: 13 pagesSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI)
Abstract: Although the method of enhancing large language models' (LLMs') reasoning ability and reducing their hallucinations through the use of knowledge graphs (KGs) has received widespread attention, the exploration of how to enable LLMs to integrate the structured knowledge in KGs on-the-fly remains inadequate. Researchers often co-train KG embeddings and LLM parameters to equip LLMs with the ability of comprehending KG knowledge. However, this resource-hungry training paradigm significantly increases the model learning cost and is also unsuitable for non-open-source, black-box LLMs. In this paper, we employ complex question answering (CQA) as a task to assess the LLM's ability of comprehending KG knowledge. We conducted a comprehensive comparison of KG knowledge injection methods (from triples to natural language text), aiming to explore the optimal prompting method for supplying KG knowledge to LLMs, thereby enhancing their comprehension of KG. Contrary to our initial expectations, our analysis revealed that LLMs effectively handle messy, noisy, and linearized KG knowledge, outperforming methods that employ well-designed natural language (NL) textual prompts. This counter-intuitive finding provides substantial insights for future research on LLMs' comprehension of structured knowledge.
- [1784] arXiv:2402.11542 (cross-list from cs.CL) [ pdf , ps , other ]
-
Title: Question Answering Over Spatio-Temporal Knowledge GraphComments: 11 pages, 4 figuresSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI)
Abstract: Spatio-temporal knowledge graphs (STKGs) extend the concept of knowledge graphs (KGs) by incorporating time and location information. While the research community's focus on Knowledge Graph Question Answering (KGQA), the field of answering questions incorporating both spatio-temporal information based on STKGs remains largely unexplored. Furthermore, a lack of comprehensive datasets also has hindered progress in this area. To address this issue, we present STQAD, a dataset comprising 10,000 natural language questions for spatio-temporal knowledge graph question answering (STKGQA). Unfortunately, various state-of-the-art KGQA approaches fall far short of achieving satisfactory performance on our dataset. In response, we propose STCQA, a new spatio-temporal KGQA approach that utilizes a novel STKG embedding method named STComplEx. By extracting temporal and spatial information from a question, our QA model can better comprehend the question and retrieve accurate answers from the STKG. Through extensive experiments, we demonstrate the quality of our dataset and the effectiveness of our STKGQA method.
- [1785] arXiv:2402.11549 (cross-list from cs.CL) [ pdf , ps , html , other ]
-
Title: Syntactic Language Change in English and German: Metrics, Parsers, and ConvergencesComments: Updated to the current versionSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI)
Abstract: Many studies have shown that human languages tend to optimize for lower complexity and increased communication efficiency. Syntactic dependency distance, which measures the linear distance between dependent words, is often considered a key indicator of language processing difficulty and working memory load. The current paper looks at diachronic trends in syntactic language change in both English and German, using corpora of parliamentary debates from the last c. 160 years. We base our observations on five dependency parsers, including the widely used Stanford CoreNLP as well as 4 newer alternatives. Our analysis of syntactic language change goes beyond linear dependency distance and explores 15 metrics relevant to dependency distance minimization (DDM) and/or based on tree graph properties, such as the tree height and degree variance. Even though we have evidence that recent parsers trained on modern treebanks are not heavily affected by data 'noise' such as spelling changes and OCR errors in our historic data, we find that results of syntactic language change are sensitive to the parsers involved, which is a caution against using a single parser for evaluating syntactic language change as done in previous work. We also show that syntactic language change over the time period investigated is largely similar between English and German for the different metrics explored: only 4% of cases we examine yield opposite conclusions regarding upwards and downtrends of syntactic metrics across German and English. We also show that changes in syntactic measures seem to be more frequent at the tails of sentence length distributions. To our best knowledge, ours is the most comprehensive analysis of syntactic language change using modern NLP technology in recent corpora of English and German.
- [1786] arXiv:2402.11550 (cross-list from cs.CL) [ pdf , ps , html , other ]
-
Title: LongAgent: Scaling Language Models to 128k Context through Multi-Agent CollaborationSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI)
Abstract: Large language models (LLMs) have demonstrated impressive performance in understanding language and executing complex reasoning tasks. However, LLMs with long context windows have been notorious for their expensive training costs and high inference latency. Even the most advanced models such as GPT-4 and Claude2 often make mistakes when processing inputs of over $100k$ tokens, a phenomenon also known as \textit{lost in the middle}. In this paper, we propose \textsc{LongAgent}, a method based on multi-agent collaboration, which scales LLMs (e.g., LLaMA) to a context of 128K and demonstrates potential superiority in long-text processing compared to GPT-4. In \textsc{LongAgent}, a leader is responsible for understanding user intent and directing team members to acquire information from documents. Due to members' hallucinations, it is non-trivial for a leader to obtain accurate information from the responses of dozens to hundreds of members. To address this, we develop an \textit{inter-member communication} mechanism to resolve response conflicts caused by hallucinations through information sharing. Our experimental results indicate that \textsc{LongAgent} offers a promising alternative for long-text processing. The agent team instantiated with LLaMA-7B achieves significant improvements in tasks such as 128k-long text retrieval, multi-hop question answering, compared to GPT-4.
- [1787] arXiv:2402.11565 (cross-list from cs.LG) [ pdf , ps , html , other ]
-
Title: Continual Learning on Graphs: Challenges, Solutions, and OpportunitiesSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Abstract: Continual learning on graph data has recently attracted paramount attention for its aim to resolve the catastrophic forgetting problem on existing tasks while adapting the sequentially updated model to newly emerged graph tasks. While there have been efforts to summarize progress on continual learning research over Euclidean data, e.g., images and texts, a systematic review of progress in continual learning on graphs, a.k.a, continual graph learning (CGL) or lifelong graph learning, is still demanding. Graph data are far more complex in terms of data structures and application scenarios, making CGL task settings, model designs, and applications extremely challenging. To bridge the gap, we provide a comprehensive review of existing continual graph learning (CGL) algorithms by elucidating the different task settings and categorizing the existing methods based on their characteristics. We compare the CGL methods with traditional continual learning techniques and analyze the applicability of the traditional continual learning techniques to CGL tasks. Additionally, we review the benchmark works that are crucial to CGL research. Finally, we discuss the remaining challenges and propose several future directions. We will maintain an up-to-date GitHub repository featuring a comprehensive list of CGL algorithms, accessible at this https URL .
- [1788] arXiv:2402.11569 (cross-list from cs.RO) [ pdf , ps , other ]
-
Title: Developing Autonomous Robot-Mediated Behavior Coaching Sessions with HaruComments: Accepted as Late Breaking Report (LBR) at the 19th Annual ACM/IEEE International Conference on Human Robot Interaction (HRI '24)Journal-ref: HRI '24 Companion, March 11-14, 2024, Boulder, CO, USASubjects: Robotics (cs.RO) ; Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
Abstract: This study presents an empirical investigation into the design and impact of autonomous dialogues in human-robot interaction for behavior change coaching. We focus on the use of Haru, a tabletop social robot, and explore the implementation of the Tiny Habits method for fostering positive behavior change. The core of our study lies in developing a fully autonomous dialogue system that maximizes Haru's emotional expressiveness and unique personality. Our methodology involved iterative design and extensive testing of the dialogue system, ensuring it effectively embodied the principles of the Tiny Habits method while also incorporating strategies for trust-raising and trust-dampening. The effectiveness of the final version of the dialogue was evaluated in an experimental study with human participants (N=12). The results indicated a significant improvement in perceptions of Haru's liveliness, interactivity, and neutrality. Additionally, our study contributes to the broader understanding of dialogue design in social robotics, offering practical insights for future developments in the field.
- [1789] arXiv:2402.11571 (cross-list from cs.RO) [ pdf , ps , other ]
-
Title: Ain't Misbehavin' -- Using LLMs to Generate Expressive Robot Behavior in Conversations with the Tabletop Robot HaruComments: Accepted as Late Breaking Report (LBR) at the 19th Annual ACM/IEEE International Conference on Human Robot Interaction (HRI '24)Journal-ref: Companion of HRI '24, March 11-14, 2024, Boulder, CO, USASubjects: Robotics (cs.RO) ; Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
Abstract: Social robots aim to establish long-term bonds with humans through engaging conversation. However, traditional conversational approaches, reliant on scripted interactions, often fall short in maintaining engaging conversations. This paper addresses this limitation by integrating large language models (LLMs) into social robots to achieve more dynamic and expressive conversations. We introduce a fully-automated conversation system that leverages LLMs to generate robot responses with expressive behaviors, congruent with the robot's personality. We incorporate robot behavior with two modalities: 1) a text-to-speech (TTS) engine capable of various delivery styles, and 2) a library of physical actions for the robot. We develop a custom, state-of-the-art emotion recognition model to dynamically select the robot's tone of voice and utilize emojis from LLM output as cues for generating robot actions. A demo of our system is available here. To illuminate design and implementation issues, we conduct a pilot study where volunteers chat with a social robot using our proposed system, and we analyze their feedback, conducting a rigorous error analysis of chat transcripts. Feedback was overwhelmingly positive, with participants commenting on the robot's empathy, helpfulness, naturalness, and entertainment. Most negative feedback was due to automatic speech recognition (ASR) errors which had limited impact on conversations. However, we observed a small class of errors, such as the LLM repeating itself or hallucinating fictitious information and human responses, that have the potential to derail conversations, raising important issues for LLM application.
- [1790] arXiv:2402.11588 (cross-list from cs.CV) [ pdf , ps , html , other ]
-
Title: SDiT: Spiking Diffusion Model with TransformerSubjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI)
Abstract: Spiking neural networks (SNNs) have low power consumption and bio-interpretable characteristics, and are considered to have tremendous potential for energy-efficient computing. However, the exploration of SNNs on image generation tasks remains very limited, and a unified and effective structure for SNN-based generative models has yet to be proposed. In this paper, we explore a novel diffusion model architecture within spiking neural networks. We utilize transformer to replace the commonly used U-net structure in mainstream diffusion models. It can generate higher quality images with relatively lower computational cost and shorter sampling time. It aims to provide an empirical baseline for research of generative models based on SNNs. Experiments on MNIST, Fashion-MNIST, and CIFAR-10 datasets demonstrate that our work is highly competitive compared to existing SNN generative models.
- [1791] arXiv:2402.11594 (cross-list from cs.LG) [ pdf , ps , other ]
-
Title: Simplifying Hyperparameter Tuning in Online Machine Learning -- The spotRiverGUISubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Abstract: Batch Machine Learning (BML) reaches its limits when dealing with very large amounts of streaming data. This is especially true for available memory, handling drift in data streams, and processing new, unknown data. Online Machine Learning (OML) is an alternative to BML that overcomes the limitations of BML. OML is able to process data in a sequential manner, which is especially useful for data streams. The `river` package is a Python OML-library, which provides a variety of online learning algorithms for classification, regression, clustering, anomaly detection, and more. The `spotRiver` package provides a framework for hyperparameter tuning of OML models. The `spotRiverGUI` is a graphical user interface for the `spotRiver` package. The `spotRiverGUI` releases the user from the burden of manually searching for the optimal hyperparameter setting. After the data is provided, users can compare different OML algorithms from the powerful `river` package in a convenient way and tune the selected algorithms very efficiently.
- [1792] arXiv:2402.11622 (cross-list from cs.CV) [ pdf , ps , other ]
-
Title: Logical Closed Loop: Uncovering Object Hallucinations in Large Vision-Language ModelsComments: 14 pages, 11 figuresSubjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG)
Abstract: Object hallucination has been an Achilles' heel which hinders the broader applications of large vision-language models (LVLMs). Object hallucination refers to the phenomenon that the LVLMs claim non-existent objects in the image. To mitigate the object hallucinations, instruction tuning and external model-based detection methods have been proposed, which either require large-scare computational resources or depend on the detection result of external models. However, there remains an under-explored field to utilize the LVLM itself to alleviate object hallucinations. In this work, we adopt the intuition that the LVLM tends to respond logically consistently for existent objects but inconsistently for hallucinated objects. Therefore, we propose a Logical Closed Loop-based framework for Object Hallucination Detection and Mitigation, namely LogicCheckGPT. In specific, we devise logical consistency probing to raise questions with logical correlations, inquiring about attributes from objects and vice versa. Whether their responses can form a logical closed loop serves as an indicator of object hallucination. As a plug-and-play method, it can be seamlessly applied to all existing LVLMs. Comprehensive experiments conducted on three benchmarks across four LVLMs have demonstrated significant improvements brought by our method, indicating its effectiveness and generality.
- [1793] arXiv:2402.11639 (cross-list from cs.LG) [ pdf , ps , other ]
-
Title: In-Context Learning with Transformers: Softmax Attention Adapts to Function LipschitznessSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
Abstract: A striking property of transformers is their ability to perform in-context learning (ICL), a machine learning framework in which the learner is presented with a novel context during inference implicitly through some data, and tasked with making a prediction in that context. As such that learner must adapt to the context without additional training. We explore the role of softmax attention in an ICL setting where each context encodes a regression task. We show that an attention unit learns a window that it uses to implement a nearest-neighbors predictor adapted to the landscape of the pretraining tasks. Specifically, we show that this window widens with decreasing Lipschitzness and increasing label noise in the pretraining tasks. We also show that on low-rank, linear problems, the attention unit learns to project onto the appropriate subspace before inference. Further, we show that this adaptivity relies crucially on the softmax activation and thus cannot be replicated by the linear activation often studied in prior theoretical analyses.
- [1794] arXiv:2402.11645 (cross-list from quant-ph) [ pdf , ps , other ]
-
Title: Quantum Image Denoising with Machine Learning: A Novel Approach to Improve Quantum Image Processing Quality and ReliabilityComments: 9 pages, 3 figuresSubjects: Quantum Physics (quant-ph) ; Artificial Intelligence (cs.AI)
Abstract: Quantum Image Processing (QIP) is a field that aims to utilize the benefits of quantum computing for manipulating and analyzing images. However, QIP faces two challenges: the limitation of qubits and the presence of noise in a quantum machine. In this research we propose a novel approach to address the issue of noise in QIP. By training and employing a machine learning model that identifies and corrects the noise in quantum processed images, we can compensate for the noisiness caused by the machine and retrieve a processing result similar to that performed by a classical computer with higher efficiency. The model is trained by learning a dataset consisting of both existing processed images and quantum processed images from open access datasets. This model will be capable of providing us with the confidence level for each pixel and its potential original value. To assess the model's accuracy in compensating for loss and decoherence in QIP, we evaluate it using three metrics: Peak Signal to Noise Ratio (PSNR), Structural Similarity Index (SSIM), and Mean Opinion Score (MOS). Additionally, we discuss the applicability of our model across domains well as its cost effectiveness compared to alternative methods.
- [1795] arXiv:2402.11671 (cross-list from cs.CL) [ pdf , ps , other ]
-
Title: Autocorrect for Estonian texts: final report from project EKTB25Agnes Luhtaru , Martin Vainikko , Krista Liin , Kais Allkivi-Metsoja , Jaagup Kippar , Pille Eslon , Mark FishelComments: in Estonian languageSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI)
Abstract: The project was funded in 2021-2023 by the National Programme of Estonian Language Technology. Its main aim was to develop spelling and grammar correction tools for the Estonian language. The main challenge was the very small amount of available error correction data needed for such development. To mitigate this, (1) we annotated more correction data for model training and testing, (2) we tested transfer-learning, i.e. retraining machine learning models created for other tasks, so as not to depend solely on correction data, (3) we compared the developed method and model with alternatives, including large language models. We also developed automatic evaluation, which can calculate the accuracy and yield of corrections by error category, so that the effectiveness of different methods can be compared in detail.
There has been a breakthrough in large language models during the project: GPT4, a commercial language model with Estonian-language support, has been created. We took into account the existence of the model when adjusting plans and in the report we present a comparison with the ability of GPT4 to improve the Estonian language text.
The final results show that the approach we have developed provides better scores than GPT4 and the result is usable but not entirely reliable yet. The report also contains ideas on how GPT4 and other major language models can be implemented in the future, focusing on open-source solutions.
All results of this project are open-data/open-source, with licenses that allow them to be used for purposes including commercial ones. - [1796] arXiv:2402.11676 (cross-list from cs.CL) [ pdf , ps , html , other ]
-
Title: A Multi-Aspect Framework for Counter Narrative Evaluation using Large Language ModelsComments: 22 pages, camera-ready version; references added, typos corrected, methodology section expanded, additional tableSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI)
Abstract: Counter narratives - informed responses to hate speech contexts designed to refute hateful claims and de-escalate encounters - have emerged as an effective hate speech intervention strategy. While previous work has proposed automatic counter narrative generation methods to aid manual interventions, the evaluation of these approaches remains underdeveloped. Previous automatic metrics for counter narrative evaluation lack alignment with human judgment as they rely on superficial reference comparisons instead of incorporating key aspects of counter narrative quality as evaluation criteria. To address prior evaluation limitations, we propose a novel evaluation framework prompting LLMs to provide scores and feedback for generated counter narrative candidates using 5 defined aspects derived from guidelines from counter narrative specialized NGOs. We found that LLM evaluators achieve strong alignment to human-annotated scores and feedback and outperform alternative metrics, indicating their potential as multi-aspect, reference-free and interpretable evaluators for counter narrative evaluation.
- [1797] arXiv:2402.11677 (cross-list from cs.CV) [ pdf , ps , html , other ]
-
Title: MultiCorrupt: A Multi-Modal Robustness Dataset and Benchmark of LiDAR-Camera Fusion for 3D Object DetectionComments: Code: this https URLSubjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI)
Abstract: Multi-modal 3D object detection models for automated driving have demonstrated exceptional performance on computer vision benchmarks like nuScenes. However, their reliance on densely sampled LiDAR point clouds and meticulously calibrated sensor arrays poses challenges for real-world applications. Issues such as sensor misalignment, miscalibration, and disparate sampling frequencies lead to spatial and temporal misalignment in data from LiDAR and cameras. Additionally, the integrity of LiDAR and camera data is often compromised by adverse environmental conditions such as inclement weather, leading to occlusions and noise interference. To address this challenge, we introduce MultiCorrupt, a comprehensive benchmark designed to evaluate the robustness of multi-modal 3D object detectors against ten distinct types of corruptions. We evaluate five state-of-the-art multi-modal detectors on MultiCorrupt and analyze their performance in terms of their resistance ability. Our results show that existing methods exhibit varying degrees of robustness depending on the type of corruption and their fusion strategy. We provide insights into which multi-modal design choices make such models robust against certain perturbations. The dataset generation code and benchmark are open-sourced at this https URL .
- [1798] arXiv:2402.11680 (cross-list from cs.CV) [ pdf , ps , other ]
-
Title: 3D Point Cloud Compression with Recurrent Neural Network and Image Compression MethodsTill Beemelmanns , Yuchen Tao , Bastian Lampe , Lennart Reiher , Raphael van Kempen , Timo Woopen , Lutz EcksteinComments: Code: this https URLJournal-ref: 2022 IEEE Intelligent Vehicles Symposium (IV)Subjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI); Image and Video Processing (eess.IV)
Abstract: Storing and transmitting LiDAR point cloud data is essential for many AV applications, such as training data collection, remote control, cloud services or SLAM. However, due to the sparsity and unordered structure of the data, it is difficult to compress point cloud data to a low volume. Transforming the raw point cloud data into a dense 2D matrix structure is a promising way for applying compression algorithms. We propose a new lossless and calibrated 3D-to-2D transformation which allows compression algorithms to efficiently exploit spatial correlations within the 2D representation. To compress the structured representation, we use common image compression methods and also a self-supervised deep compression approach using a recurrent neural network. We also rearrange the LiDAR's intensity measurements to a dense 2D representation and propose a new metric to evaluate the compression performance of the intensity. Compared to approaches that are based on generic octree point cloud compression or based on raw point cloud data compression, our approach achieves the best quantitative and visual performance. Source code and dataset are available at this https URL .
- [1799] arXiv:2402.11684 (cross-list from cs.CL) [ pdf , ps , other ]
-
Title: ALLaVA: Harnessing GPT4V-synthesized Data for A Lite Vision-Language ModelGuiming Hardy Chen , Shunian Chen , Ruifei Zhang , Junying Chen , Xiangbo Wu , Zhiyi Zhang , Zhihong Chen , Jianquan Li , Xiang Wan , Benyou WangComments: 19 pagesSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI)
Abstract: Recent advancements in Large Vision-Language Models (LVLMs) have enabled processing of multimodal inputs in language models but require significant computational resources for deployment, especially in edge devices. This study aims to bridge the performance gap between traditional-scale LVLMs and resource-friendly lite versions by adopting high-quality training data. To do this, a synthetic dataset is created by leveraging GPT-4V's ability to generate detailed captions, complex reasoning instructions and detailed answers from images. The resulted model trained with our data, ALLaVA, achieves competitive performance on 12 benchmarks up to 3B LVLMs. This work highlights the feasibility of adopting high-quality data in crafting more efficient LVLMs. Our online demo is available at \url{ this https URL }.
- [1800] arXiv:2402.11702 (cross-list from cs.SE) [ pdf , ps , html , other ]
-
Title: Can ChatGPT Support Developers? An Empirical Evaluation of Large Language Models for Code GenerationComments: 4 pages, 3 figures, 21st International Conference on Mining Software Repositories (MSR '24), April 15-16, 2024, Lisbon, PortugalSubjects: Software Engineering (cs.SE) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: Large language models (LLMs) have demonstrated notable proficiency in code generation, with numerous prior studies showing their promising capabilities in various development scenarios. However, these studies mainly provide evaluations in research settings, which leaves a significant gap in understanding how effectively LLMs can support developers in real-world. To address this, we conducted an empirical analysis of conversations in DevGPT, a dataset collected from developers' conversations with ChatGPT (captured with the Share Link feature on platforms such as GitHub). Our empirical findings indicate that the current practice of using LLM-generated code is typically limited to either demonstrating high-level concepts or providing examples in documentation, rather than to be used as production-ready code. These findings indicate that there is much future work needed to improve LLMs in code generation before they can be integral parts of modern software development.
- [1801] arXiv:2402.11707 (cross-list from cs.IR) [ pdf , ps , other ]
-
Title: Search Engines Post-ChatGPT: How Generative Artificial Intelligence Could Make Search Less ReliableComments: 9 pages, 5 figuresSubjects: Information Retrieval (cs.IR) ; Artificial Intelligence (cs.AI); Computers and Society (cs.CY)
Abstract: In this commentary, we discuss the evolving nature of search engines, as they begin to generate, index, and distribute content created by generative artificial intelligence (GenAI). Our discussion highlights challenges in the early stages of GenAI integration, particularly around factual inconsistencies and biases. We discuss how output from GenAI carries an unwarranted sense of credibility, while decreasing transparency and sourcing ability. Furthermore, search engines are already answering queries with error-laden, generated content, further blurring the provenance of information and impacting the integrity of the information ecosystem. We argue how all these factors could reduce the reliability of search engines. Finally, we summarize some of the active research directions and open questions.
- [1802] arXiv:2402.11709 (cross-list from cs.CL) [ pdf , ps , other ]
-
Title: GNNavi: Navigating the Information Flow in Large Language Models by Graph Neural NetworkComments: 15 pages, 9 figuresSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI)
Abstract: Large Language Models (LLMs) exhibit strong In-Context Learning (ICL) capabilities when prompts with demonstrations are applied to them. However, fine-tuning still remains crucial to further enhance their adaptability. Prompt-based fine-tuning proves to be an effective fine-tuning method in low-data scenarios, but high demands on computing resources limit its practicality. We address this issue by introducing a prompt-based parameter-efficient fine-tuning (PEFT) approach. GNNavi leverages insights into ICL's information flow dynamics, which indicates that label words act in prompts as anchors for information propagation. GNNavi employs a Graph Neural Network (GNN) layer to precisely guide the aggregation and distribution of information flow during the processing of prompts by hardwiring the desired information flow into the GNN. Our experiments on text classification tasks with GPT-2 and Llama2 shows GNNavi surpasses standard prompt-based fine-tuning methods in few-shot settings by updating just 0.2% to 0.5% of parameters. We compare GNNavi with prevalent PEFT approaches, such as prefix tuning, LoRA and Adapter in terms of performance and efficiency. Our analysis reveals that GNNavi enhances information flow and ensures a clear aggregation process.
- [1803] arXiv:2402.11729 (cross-list from cs.LG) [ pdf , ps , other ]
-
Title: Prospector Heads: Generalized Feature Attribution for Large Models & DataGautam Machiraju , Alexander Derry , Arjun Desai , Neel Guha , Amir-Hossein Karimi , James Zou , Russ Altman , Christopher Ré , Parag MallickComments: 22 pages, 12 figures, 6 tablesSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Quantitative Methods (q-bio.QM)
Abstract: Feature attribution, the ability to localize regions of the input data that are relevant for classification, is an important capability for machine learning models in scientific and biomedical domains. Current methods for feature attribution, which rely on "explaining" the predictions of end-to-end classifiers, suffer from imprecise feature localization and are inadequate for use with small sample sizes and high-dimensional datasets due to computational challenges. We introduce prospector heads, an efficient and interpretable alternative to explanation-based methods for feature attribution that can be applied to any encoder and any data modality. Prospector heads generalize across modalities through experiments on sequences (text), images (pathology), and graphs (protein structures), outperforming baseline attribution methods by up to 49 points in mean localization AUPRC. We also demonstrate how prospector heads enable improved interpretation and discovery of class-specific patterns in the input data. Through their high performance, flexibility, and generalizability, prospectors provide a framework for improving trust and transparency for machine learning models in complex domains.
- [1804] arXiv:2402.11733 (cross-list from cs.LG) [ pdf , ps , other ]
-
Title: The Effectiveness of Random Forgetting for Robust GeneralizationComments: Published as a conference paper at ICLR 2024Subjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
Abstract: Deep neural networks are susceptible to adversarial attacks, which can compromise their performance and accuracy. Adversarial Training (AT) has emerged as a popular approach for protecting neural networks against such attacks. However, a key challenge of AT is robust overfitting, where the network's robust performance on test data deteriorates with further training, thus hindering generalization. Motivated by the concept of active forgetting in the brain, we introduce a novel learning paradigm called "Forget to Mitigate Overfitting (FOMO)". FOMO alternates between the forgetting phase, which randomly forgets a subset of weights and regulates the model's information through weight reinitialization, and the relearning phase, which emphasizes learning generalizable features. Our experiments on benchmark datasets and adversarial attacks show that FOMO alleviates robust overfitting by significantly reducing the gap between the best and last robust test accuracy while improving the state-of-the-art robustness. Furthermore, FOMO provides a better trade-off between standard and robust accuracy, outperforming baseline adversarial methods. Finally, our framework is robust to AutoAttacks and increases generalization in many real-world scenarios.
- [1805] arXiv:2402.11734 (cross-list from cs.PL) [ pdf , ps , html , other ]
-
Title: Solving Data-centric Tasks using Large Language ModelsShraddha Barke , Christian Poelitz , Carina Suzana Negreanu , Benjamin Zorn , José Cambronero , Andrew D. Gordon , Vu Le , Elnaz Nouri , Nadia Polikarpova , Advait Sarkar , Brian Slininger , Neil Toronto , Jack WilliamsComments: Paper accepted to NAACL 2024 (Findings)Subjects: Programming Languages (cs.PL) ; Artificial Intelligence (cs.AI); Software Engineering (cs.SE)
Abstract: Large language models (LLMs) are rapidly replacing help forums like StackOverflow, and are especially helpful for non-professional programmers and end users. These users are often interested in data-centric tasks, such as spreadsheet manipulation and data wrangling, which are hard to solve if the intent is only communicated using a natural-language description, without including the data. But how do we decide how much data and which data to include in the prompt? This paper makes two contributions towards answering this question. First, we create a dataset of real-world NL-to-code tasks manipulating tabular data, mined from StackOverflow posts. Second, we introduce a cluster-then-select prompting technique, which adds the most representative rows from the input data to the LLM prompt. Our experiments show that LLM performance is indeed sensitive to the amount of data passed in the prompt, and that for tasks with a lot of syntactic variation in the input table, our cluster-then-select technique outperforms a random selection baseline.
- [1806] arXiv:2402.11737 (cross-list from cs.LG) [ pdf , ps , html , other ]
-
Title: Compression Repair for Feedforward Neural Networks Based on Model Equivalence EvaluationComments: ACC 2024Subjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Abstract: In this paper, we propose a method of repairing compressed Feedforward Neural Networks (FNNs) based on equivalence evaluation of two neural networks. In the repairing framework, a novel neural network equivalence evaluation method is developed to compute the output discrepancy between two neural networks. The output discrepancy can quantitatively characterize the output difference produced by compression procedures. Based on the computed output discrepancy, the repairing method first initializes a new training set for the compressed networks to narrow down the discrepancy between the two neural networks and improve the performance of the compressed network. Then, we repair the compressed FNN by re-training based on the training set. We apply our developed method to the MNIST dataset to demonstrate the effectiveness and advantages of our proposed repair method.
- [1807] arXiv:2402.11746 (cross-list from cs.CL) [ pdf , ps , other ]
-
Title: Language Models are Homer Simpson! Safety Re-Alignment of Fine-tuned Language Models through Task ArithmeticSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI)
Abstract: Aligned language models face a significant limitation as their fine-tuning often results in compromised safety. To tackle this, we propose a simple method RESTA that performs LLM safety realignment. RESTA stands for REstoring Safety through Task Arithmetic. At its core, it involves a simple arithmetic addition of a safety vector to the weights of the compromised model. We demonstrate the effectiveness of RESTA in both parameter-efficient and full fine-tuning, covering a wide range of downstream tasks, including instruction following in Chinese, English, and Hindi, as well as problem-solving capabilities in Code and Math. We also showcase the generalizability of RESTA on three existing safety evaluation benchmarks and a multilingual benchmark dataset proposed as a part of this work, consisting of 550 harmful questions covering 11 categories, each with 5 sub-categories of harm. Overall, RESTA decreases the harmfulness of the compromised model from 18.6% to 5.1% and from 9.2% to 1.5% in parameter-efficient and full fine-tuning, respectively, while maintaining most of the model's performance on the task. We release the source codes at: this https URL .
- [1808] arXiv:2402.11752 (cross-list from cs.LG) [ pdf , ps , other ]
-
Title: Diagonalisation SGD: Fast & Convergent SGD for Non-Differentiable Models via Reparameterisation and SmoothingSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Optimization and Control (math.OC)
Abstract: It is well-known that the reparameterisation gradient estimator, which exhibits low variance in practice, is biased for non-differentiable models. This may compromise correctness of gradient-based optimisation methods such as stochastic gradient descent (SGD). We introduce a simple syntactic framework to define non-differentiable functions piecewisely and present a systematic approach to obtain smoothings for which the reparameterisation gradient estimator is unbiased. Our main contribution is a novel variant of SGD, Diagonalisation Stochastic Gradient Descent, which progressively enhances the accuracy of the smoothed approximation during optimisation, and we prove convergence to stationary points of the unsmoothed (original) objective. Our empirical evaluation reveals benefits over the state of the art: our approach is simple, fast, stable and attains orders of magnitude reduction in work-normalised variance.
- [1809] arXiv:2402.11753 (cross-list from cs.CL) [ pdf , ps , html , other ]
-
Title: ArtPrompt: ASCII Art-based Jailbreak Attacks against Aligned LLMsFengqing Jiang , Zhangchen Xu , Luyao Niu , Zhen Xiang , Bhaskar Ramasubramanian , Bo Li , Radha PoovendranSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI)
Abstract: Safety is critical to the usage of large language models (LLMs). Multiple techniques such as data filtering and supervised fine-tuning have been developed to strengthen LLM safety. However, currently known techniques presume that corpora used for safety alignment of LLMs are solely interpreted by semantics. This assumption, however, does not hold in real-world applications, which leads to severe vulnerabilities in LLMs. For example, users of forums often use ASCII art, a form of text-based art, to convey image information. In this paper, we propose a novel ASCII art-based jailbreak attack and introduce a comprehensive benchmark Vision-in-Text Challenge (ViTC) to evaluate the capabilities of LLMs in recognizing prompts that cannot be solely interpreted by semantics. We show that five SOTA LLMs (GPT-3.5, GPT-4, Gemini, Claude, and Llama2) struggle to recognize prompts provided in the form of ASCII art. Based on this observation, we develop the jailbreak attack ArtPrompt, which leverages the poor performance of LLMs in recognizing ASCII art to bypass safety measures and elicit undesired behaviors from LLMs. ArtPrompt only requires black-box access to the victim LLMs, making it a practical attack. We evaluate ArtPrompt on five SOTA LLMs, and show that ArtPrompt can effectively and efficiently induce undesired behaviors from all five LLMs. Our code is available at this https URL .
- [1810] arXiv:2402.11764 (cross-list from cs.CL) [ pdf , ps , other ]
-
Title: ChatGPT Based Data Augmentation for Improved Parameter-Efficient Debiasing of LLMsComments: Accepted to EACL 2024 Workshop on Language Technology for Equality, Diversity, Inclusion (LT-EDI-2024)Subjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI); Computers and Society (cs.CY)
Abstract: Large Language models (LLMs), while powerful, exhibit harmful social biases. Debiasing is often challenging due to computational costs, data constraints, and potential degradation of multi-task language capabilities. This work introduces a novel approach utilizing ChatGPT to generate synthetic training data, aiming to enhance the debiasing of LLMs. We propose two strategies: Targeted Prompting, which provides effective debiasing for known biases but necessitates prior specification of bias in question; and General Prompting, which, while slightly less effective, offers debiasing across various categories. We leverage resource-efficient LLM debiasing using adapter tuning and compare the effectiveness of our synthetic data to existing debiasing datasets. Our results reveal that: (1) ChatGPT can efficiently produce high-quality training data for debiasing other LLMs; (2) data produced via our approach surpasses existing datasets in debiasing performance while also preserving internal knowledge of a pre-trained LLM; and (3) synthetic data exhibits generalizability across categories, effectively mitigating various biases, including intersectional ones. These findings underscore the potential of synthetic data in advancing the fairness of LLMs with minimal retraining cost.
- [1811] arXiv:2402.11771 (cross-list from cs.LG) [ pdf , ps , html , other ]
-
Title: Evaluating the Effectiveness of Index-Based Treatment AllocationSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Methodology (stat.ME); Machine Learning (stat.ML)
Abstract: When resources are scarce, an allocation policy is needed to decide who receives a resource. This problem occurs, for instance, when allocating scarce medical resources and is often solved using modern ML methods. This paper introduces methods to evaluate index-based allocation policies -- that allocate a fixed number of resources to those who need them the most -- by using data from a randomized control trial. Such policies create dependencies between agents, which render the assumptions behind standard statistical tests invalid and limit the effectiveness of estimators. Addressing these challenges, we translate and extend recent ideas from the statistics literature to present an efficient estimator and methods for computing asymptotically correct confidence intervals. This enables us to effectively draw valid statistical conclusions, a critical gap in previous work. Our extensive experiments validate our methodology in practical settings, while also showcasing its statistical power. We conclude by proposing and empirically verifying extensions of our methodology that enable us to reevaluate a past randomized control trial to evaluate different ML allocation policies in the context of a mHealth program, drawing previously invisible conclusions.
- [1812] arXiv:2402.11773 (cross-list from cs.LG) [ pdf , ps , html , other ]
-
Title: Dynamic Multi-Network Mining of Tensor Time SeriesComments: Accepted by WWW 2024Subjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Information Theory (cs.IT)
Abstract: Subsequence clustering of time series is an essential task in data mining, and interpreting the resulting clusters is also crucial since we generally do not have prior knowledge of the data. Thus, given a large collection of tensor time series consisting of multiple modes, including timestamps, how can we achieve subsequence clustering for tensor time series and provide interpretable insights? In this paper, we propose a new method, Dynamic Multi-network Mining (DMM), that converts a tensor time series into a set of segment groups of various lengths (i.e., clusters) characterized by a dependency network constrained with l1-norm. Our method has the following properties. (a) Interpretable: it characterizes the cluster with multiple networks, each of which is a sparse dependency network of a corresponding non-temporal mode, and thus provides visible and interpretable insights into the key relationships. (b) Accurate: it discovers the clusters with distinct networks from tensor time series according to the minimum description length (MDL). (c) Scalable: it scales linearly in terms of the input data size when solving a non-convex problem to optimize the number of segments and clusters, and thus it is applicable to long-range and high-dimensional tensors. Extensive experiments with synthetic datasets confirm that our method outperforms the state-of-the-art methods in terms of clustering accuracy. We then use real datasets to demonstrate that DMM is useful for providing interpretable insights from tensor time series.
- [1813] arXiv:2402.11777 (cross-list from cs.CL) [ pdf , ps , other ]
-
Title: Uncovering Latent Human Wellbeing in Language Model EmbeddingsComments: 10 pages, 5 figures, 1 tableSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: Do language models implicitly learn a concept of human wellbeing? We explore this through the ETHICS Utilitarianism task, assessing if scaling enhances pretrained models' representations. Our initial finding reveals that, without any prompt engineering or finetuning, the leading principal component from OpenAI's text-embedding-ada-002 achieves 73.9% accuracy. This closely matches the 74.6% of BERT-large finetuned on the entire ETHICS dataset, suggesting pretraining conveys some understanding about human wellbeing. Next, we consider four language model families, observing how Utilitarianism accuracy varies with increased parameters. We find performance is nondecreasing with increased model size when using sufficient numbers of principal components.
- [1814] arXiv:2402.11778 (cross-list from cs.LG) [ pdf , ps , other ]
-
Title: Towards Theoretical Understandings of Self-Consuming Generative ModelsSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Abstract: This paper tackles the emerging challenge of training generative models within a self-consuming loop, wherein successive generations of models are recursively trained on mixtures of real and synthetic data from previous generations. We construct a theoretical framework to rigorously evaluate how this training regimen impacts the data distributions learned by future models. Specifically, we derive bounds on the total variation (TV) distance between the synthetic data distributions produced by future models and the original real data distribution under various mixed training scenarios. Our analysis demonstrates that this distance can be effectively controlled under the condition that mixed training dataset sizes or proportions of real data are large enough. Interestingly, we further unveil a phase transition induced by expanding synthetic data amounts, proving theoretically that while the TV distance exhibits an initial ascent, it declines beyond a threshold point. Finally, we specialize our general results to diffusion models, delivering nuanced insights such as the efficacy of optimal early stopping within the self-consuming loop.
- [1815] arXiv:2402.11780 (cross-list from cs.AR) [ pdf , ps , html , other ]
-
Title: CiMNet: Towards Joint Optimization for DNN Architecture and Configuration for Compute-In-Memory HardwareComments: 6 pages, 4 figures, 5 tables; Accepted as a full paper by the tinyML Research Symposium 2024Subjects: Hardware Architecture (cs.AR) ; Artificial Intelligence (cs.AI)
Abstract: With the recent growth in demand for large-scale deep neural networks, compute in-memory (CiM) has come up as a prominent solution to alleviate bandwidth and on-chip interconnect bottlenecks that constrain Von-Neuman architectures. However, the construction of CiM hardware poses a challenge as any specific memory hierarchy in terms of cache sizes and memory bandwidth at different interfaces may not be ideally matched to any neural network's attributes such as tensor dimension and arithmetic intensity, thus leading to suboptimal and under-performing systems. Despite the success of neural architecture search (NAS) techniques in yielding efficient sub-networks for a given hardware metric budget (e.g., DNN execution time or latency), it assumes the hardware configuration to be frozen, often yielding sub-optimal sub-networks for a given budget. In this paper, we present CiMNet, a framework that jointly searches for optimal sub-networks and hardware configurations for CiM architectures creating a Pareto optimal frontier of downstream task accuracy and execution metrics (e.g., latency). The proposed framework can comprehend the complex interplay between a sub-network's performance and the CiM hardware configuration choices including bandwidth, processing element size, and memory size. Exhaustive experiments on different model architectures from both CNN and Transformer families demonstrate the efficacy of the CiMNet in finding co-optimized sub-networks and CiM hardware configurations. Specifically, for similar ImageNet classification accuracy as baseline ViT-B, optimizing only the model architecture increases performance (or reduces workload execution time) by 1.7x while optimizing for both the model architecture and hardware configuration increases it by 3.1x.
- [1816] arXiv:2402.11788 (cross-list from cs.CV) [ pdf , ps , html , other ]
-
Title: MM-SurvNet: Deep Learning-Based Survival Risk Stratification in Breast Cancer Through Multimodal Data FusionComments: Keywords: Multimodal Fusion, Breast Cancer, Whole Slide Images, Survival PredictionSubjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI)
Abstract: Survival risk stratification is an important step in clinical decision making for breast cancer management. We propose a novel deep learning approach for this purpose by integrating histopathological imaging, genetic and clinical data. It employs vision transformers, specifically the MaxViT model, for image feature extraction, and self-attention to capture intricate image relationships at the patient level. A dual cross-attention mechanism fuses these features with genetic data, while clinical data is incorporated at the final layer to enhance predictive accuracy. Experiments on the public TCGA-BRCA dataset show that our model, trained using the negative log likelihood loss function, can achieve superior performance with a mean C-index of 0.64, surpassing existing methods. This advancement facilitates tailored treatment strategies, potentially leading to improved patient outcomes.
- [1817] arXiv:2402.11793 (cross-list from cs.LG) [ pdf , ps , html , other ]
-
Title: Generative Kaleidoscopic NetworksSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Abstract: We discovered that the neural networks, especially the deep ReLU networks, demonstrate an `over-generalization' phenomenon. That is, the output values for the inputs that were not seen during training are mapped close to the output range that were observed during the learning process. In other words, the neural networks learn a many-to-one mapping and this effect is more prominent as we increase the number of layers or the depth of the neural network. We utilize this property of neural networks to design a dataset kaleidoscope, termed as `Generative Kaleidoscopic Networks'. Briefly, if we learn a model to map from input $x\in\mathbb{R}^D$ to itself $f_\mathcal{N}(x)\rightarrow x$, the proposed `Kaleidoscopic sampling' procedure starts with a random input noise $z\in\mathbb{R}^D$ and recursively applies $f_\mathcal{N}(\cdots f_\mathcal{N}(z)\cdots )$. After a burn-in period duration, we start observing samples from the input distribution and the quality of samples recovered improves as we increase the depth of the model. Scope: We observed this phenomenon to various degrees for the other deep learning architectures like CNNs, Transformers & U-Nets and we are currently investigating them further.
- [1818] arXiv:2402.11800 (cross-list from cs.LG) [ pdf , ps , html , other ]
-
Title: Stochastic Approximation with Delayed Updates: Finite-Time Rates under Markovian SamplingArman Adibi , Nicolo Dal Fabbro , Luca Schenato , Sanjeev Kulkarni , H. Vincent Poor , George J. Pappas , Hamed Hassani , Aritra MitraComments: Accepted to the 27th International Conference on Artificial Intelligence and Statistics (AISTATS) 2024!Subjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Multiagent Systems (cs.MA); Systems and Control (eess.SY); Optimization and Control (math.OC)
Abstract: Motivated by applications in large-scale and multi-agent reinforcement learning, we study the non-asymptotic performance of stochastic approximation (SA) schemes with delayed updates under Markovian sampling. While the effect of delays has been extensively studied for optimization, the manner in which they interact with the underlying Markov process to shape the finite-time performance of SA remains poorly understood. In this context, our first main contribution is to show that under time-varying bounded delays, the delayed SA update rule guarantees exponentially fast convergence of the \emph{last iterate} to a ball around the SA operator's fixed point. Notably, our bound is \emph{tight} in its dependence on both the maximum delay $\tau_{max}$, and the mixing time $\tau_{mix}$. To achieve this tight bound, we develop a novel inductive proof technique that, unlike various existing delayed-optimization analyses, relies on establishing uniform boundedness of the iterates. As such, our proof may be of independent interest. Next, to mitigate the impact of the maximum delay on the convergence rate, we provide the first finite-time analysis of a delay-adaptive SA scheme under Markovian sampling. In particular, we show that the exponent of convergence of this scheme gets scaled down by $\tau_{avg}$, as opposed to $\tau_{max}$ for the vanilla delayed SA rule; here, $\tau_{avg}$ denotes the average delay across all iterations. Moreover, the adaptive scheme requires no prior knowledge of the delay sequence for step-size tuning. Our theoretical findings shed light on the finite-time effects of delays for a broad class of algorithms, including TD learning, Q-learning, and stochastic gradient descent under Markovian sampling.
- [1819] arXiv:2402.11809 (cross-list from cs.CL) [ pdf , ps , html , other ]
-
Title: Generation Meets Verification: Accelerating Large Language Model Inference with Smart Parallel Auto-Correct DecodingSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: This research aims to accelerate the inference speed of large language models (LLMs) with billions of parameters. We propose \textbf{S}mart \textbf{P}arallel \textbf{A}uto-\textbf{C}orrect d\textbf{E}coding (SPACE), an innovative approach designed for achieving lossless acceleration of LLMs. By integrating semi-autoregressive inference and speculative decoding capabilities, SPACE uniquely enables autoregressive LLMs to parallelize token generation and verification. This is realized through a specialized semi-autoregressive supervised fine-tuning process that equips existing LLMs with the ability to simultaneously predict multiple tokens. Additionally, an auto-correct decoding algorithm facilitates the simultaneous generation and verification of token sequences within a single model invocation. Through extensive experiments on a range of LLMs, SPACE has demonstrated inference speedup ranging from 2.7x-4.0x on HumanEval-X while maintaining output quality.
- [1820] arXiv:2402.11813 (cross-list from cs.RO) [ pdf , ps , other ]
-
Title: A novel framework for adaptive stress testing of autonomous vehicles in highwaysSubjects: Robotics (cs.RO) ; Artificial Intelligence (cs.AI)
Abstract: Guaranteeing the safe operations of autonomous vehicles (AVs) is crucial for their widespread adoption and public acceptance. It is thus of a great significance to not only assess the AV against the standard safety tests, but also discover potential corner cases of the AV under test that could lead to unsafe behaviour or scenario. In this paper, we propose a novel framework to systematically explore corner cases that can result in safety concerns in a highway traffic scenario. The framework is based on an adaptive stress testing (AST) approach, an emerging validation method that leverages a Markov decision process to formulate the scenarios and deep reinforcement learning (DRL) to discover the desirable patterns representing corner cases. To this end, we develop a new reward function for DRL to guide the AST in identifying crash scenarios based on the collision probability estimate between the AV under test (i.e., the ego vehicle) and the trajectory of other vehicles on the highway. The proposed framework is further integrated with a new driving model enabling us to create more realistic traffic scenarios capturing both the longitudinal and lateral movements of vehicles on the highway. In our experiment, we calibrate our model using real-world crash statistics involving automated vehicles in California, and then we analyze the characteristics of the AV and the framework. Quantitative and qualitative analyses of our experimental results demonstrate that our framework outperforms other existing AST schemes. The study can help discover crash scenarios of AV that are unknown or absent in human driving, thereby enhancing the safety and trustworthiness of AV technology.
- [1821] arXiv:2402.11815 (cross-list from cs.CL) [ pdf , ps , html , other ]
-
Title: HU at SemEval-2024 Task 8A: Can Contrastive Learning Learn Embeddings to Detect Machine-Generated Text?Comments: Camera Ready Version - Accepted in SemEval 2024 (Colocated with NAACL 2024)Subjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: This paper describes our system developed for SemEval-2024 Task 8, ``Multigenerator, Multidomain, and Multilingual Black-Box Machine-Generated Text Detection'' Machine-generated texts have been one of the main concerns due to the use of large language models (LLM) in fake text generation, phishing, cheating in exams, or even plagiarizing copyright materials. A lot of systems have been developed to detect machine-generated text. Nonetheless, the majority of these systems rely on the text-generating model. This limitation is impractical in real-world scenarios, as it's often impossible to know which specific model the user has used for text generation. In this work, we propose a $\textbf{single}$ model based on contrastive learning, which uses $\textbf{$\approx$40% of the baseline's parameters}$ (149M vs. 355M) but shows a comparable performance on the test dataset $(\textbf{21st out of 137 participants})$. Our key finding is that even without an ensemble of multiple models, a single base model can have comparable performance with the help of data augmentation and contrastive learning. Our code is publicly available at this https URL .
- [1822] arXiv:2402.11818 (cross-list from cs.CL) [ pdf , ps , html , other ]
-
Title: Where It Really Matters: Few-Shot Environmental Conservation Media Monitoring for Low-Resource LanguagesSameer Jain , Sedrick Scott Keh , Shova Chettri , Karun Dewan , Pablo Izquierdo , Johanna Prussman , Pooja Shreshtha , Cesar Suarez , Zheyuan Ryan Shi , Lei Li , Fei FangComments: AAAI 2024: AI for Social Impact TrackSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI); Computers and Society (cs.CY)
Abstract: Environmental conservation organizations routinely monitor news content on conservation in protected areas to maintain situational awareness of developments that can have an environmental impact. Existing automated media monitoring systems require large amounts of data labeled by domain experts, which is only feasible at scale for high-resource languages like English. However, such tools are most needed in the global south where news of interest is mainly in local low-resource languages, and far fewer experts are available to annotate datasets sustainably. In this paper, we propose NewsSerow, a method to automatically recognize environmental conservation content in low-resource languages. NewsSerow is a pipeline of summarization, in-context few-shot classification, and self-reflection using large language models (LLMs). Using at most 10 demonstration example news articles in Nepali, NewsSerow significantly outperforms other few-shot methods and achieves comparable performance with models fully fine-tuned using thousands of examples. The World Wide Fund for Nature (WWF) has deployed NewsSerow for media monitoring in Nepal, significantly reducing their operational burden, and ensuring that AI tools for conservation actually reach the communities that need them the most. NewsSerow has also been deployed for countries with other languages like Colombia.
- [1823] arXiv:2402.11842 (cross-list from cs.SE) [ pdf , ps , other ]
-
Title: CodeArt: Better Code Models by Attention Regularization When Symbols Are LackingSubjects: Software Engineering (cs.SE) ; Artificial Intelligence (cs.AI)
Abstract: Transformer based code models have impressive performance in many software engineering tasks. However, their effectiveness degrades when symbols are missing or not informative. The reason is that the model may not learn to pay attention to the right correlations/contexts without the help of symbols. We propose a new method to pre-train general code models when symbols are lacking. We observe that in such cases, programs degenerate to something written in a very primitive language. We hence propose to use program analysis to extract contexts a priori (instead of relying on symbols and masked language modeling as in vanilla models). We then leverage a novel attention masking method to only allow the model attending to these contexts, e.g., bi-directional program dependence transitive closures and token co-occurrences. In the meantime, the inherent self-attention mechanism is utilized to learn which of the allowed attentions are more important compared to others. To realize the idea, we enhance the vanilla tokenization and model architecture of a BERT model, construct and utilize attention masks, and introduce a new pre-training algorithm. We pre-train this BERT-like model from scratch, using a dataset of 26 million stripped binary functions with explicit program dependence information extracted by our tool. We apply the model in three downstream tasks: binary similarity, type inference, and malware family classification. Our pre-trained model can improve the SOTAs in these tasks from 53% to 64%, 49% to 60%, and 74% to 94%, respectively. It also substantially outperforms other general pre-training techniques of code understanding models.
- [1824] arXiv:2402.11866 (cross-list from cs.CG) [ pdf , ps , html , other ]
-
Title: Two Online Map Matching Algorithms Based on Analytic Hierarchy Process and Fuzzy LogicJeremy J. Lin , Tomoro Mochida , Riley C. W. O'Neill , Atsuro Yoshida , Masashi Yamazaki , Akinobu SasadaComments: 25 pages, 27 figuresSubjects: Computational Geometry (cs.CG) ; Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
Abstract: Our aim of this paper is to develop new map matching algorithms and to improve upon previous work. We address two key approaches: Analytic Hierarchy Process (AHP) map matching and fuzzy logic map matching. AHP is a decision-making method that combines mathematical analysis with human judgment, and fuzzy logic is an approach to computing based on the degree of truth and aims at modeling the imprecise modes of reasoning from 0 to 1 rather than the usual boolean logic. Of these algorithms, the way of our applying AHP to map matching is newly developed in this paper, meanwhile, our application of fuzzy logic to map matching is mostly the same as existing research except for some small changes. Because of the common characteristic that both methods are designed to handle imprecise information and simplicity for implementation, we decided to use these methods.
- [1825] arXiv:2402.11871 (cross-list from cs.RO) [ pdf , ps , html , other ]
-
Title: From Reals to Logic and Back: Inventing Symbolic Vocabularies, Actions, and Models for Planning from Raw DataSubjects: Robotics (cs.RO) ; Artificial Intelligence (cs.AI)
Abstract: Hand-crafted, logic-based state and action representations have been widely used to overcome the intractable computational complexity of long-horizon robot planning problems, including task and motion planning problems. However, creating such representations requires experts with strong intuitions and detailed knowledge about the robot and the tasks it may need to accomplish in a given setting. Removing this dependency on human intuition is a highly active research area.
This paper presents the first approach for autonomously learning generalizable, logic-based relational representations for abstract states and actions starting from unannotated high-dimensional, real-valued robot trajectories. The learned representations constitute auto-invented PDDL-like domain models. Empirical results in deterministic settings show that powerful abstract representations can be learned from just a handful of robot trajectories; the learned relational representations include but go beyond classical, intuitive notions of high-level actions; and that the learned models allow planning algorithms to scale to tasks that were previously beyond the scope of planning without hand-crafted abstractions. - [1826] arXiv:2402.11877 (cross-list from cs.LG) [ pdf , ps , other ]
-
Title: Finite-Time Error Analysis of Online Model-Based Q-Learning with a Relaxed Sampling ModelSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Abstract: Reinforcement learning has witnessed significant advancements, particularly with the emergence of model-based approaches. Among these, $Q$-learning has proven to be a powerful algorithm in model-free settings. However, the extension of $Q$-learning to a model-based framework remains relatively unexplored. In this paper, we delve into the sample complexity of $Q$-learning when integrated with a model-based approach. Through theoretical analyses and empirical evaluations, we seek to elucidate the conditions under which model-based $Q$-learning excels in terms of sample efficiency compared to its model-free counterpart.
- [1827] arXiv:2402.11886 (cross-list from cs.CL) [ pdf , ps , other ]
-
Title: The Colorful Future of LLMs: Evaluating and Improving LLMs as Emotional Supporters for Queer YouthShir Lissak , Nitay Calderon , Geva Shenkman , Yaakov Ophir , Eyal Fruchter , Anat Brunstein Klomek , Roi ReichartSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI)
Abstract: Queer youth face increased mental health risks, such as depression, anxiety, and suicidal ideation. Hindered by negative stigma, they often avoid seeking help and rely on online resources, which may provide incompatible information. Although access to a supportive environment and reliable information is invaluable, many queer youth worldwide have no access to such support. However, this could soon change due to the rapid adoption of Large Language Models (LLMs) such as ChatGPT. This paper aims to comprehensively explore the potential of LLMs to revolutionize emotional support for queers. To this end, we conduct a qualitative and quantitative analysis of LLM's interactions with queer-related content. To evaluate response quality, we develop a novel ten-question scale that is inspired by psychological standards and expert input. We apply this scale to score several LLMs and human comments to posts where queer youth seek advice and share experiences. We find that LLM responses are supportive and inclusive, outscoring humans. However, they tend to be generic, not empathetic enough, and lack personalization, resulting in nonreliable and potentially harmful advice. We discuss these challenges, demonstrate that a dedicated prompt can improve the performance, and propose a blueprint of an LLM-supporter that actively (but sensitively) seeks user context to provide personalized, empathetic, and reliable responses. Our annotated dataset is available for further research.
- [1828] arXiv:2402.11892 (cross-list from cs.SE) [ pdf , ps , other ]
-
Title: Evaluating Program Repair with Semantic-Preserving Transformations: A Naturalness AssessmentSubjects: Software Engineering (cs.SE) ; Artificial Intelligence (cs.AI)
Abstract: In this paper, we investigate the naturalness of semantic-preserving transformations and their impacts on the evaluation of NPR. To achieve this, we conduct a two-stage human study, including (1) interviews with senior software developers to establish the first concrete criteria for assessing the naturalness of code transformations and (2) a survey involving 10 developers to assess the naturalness of 1178 transformations, i.e., pairs of original and transformed programs, applied to 225 real-world bugs. Our findings reveal that nearly 60% and 20% of these transformations are considered natural and unnatural with substantially high agreement among human annotators. Furthermore, the unnatural code transformations introduce a 25.2% false alarm rate on robustness of five well-known NPR systems. Additionally, the performance of the NPR systems drops notably when evaluated using natural transformations, i.e., a drop of up to 22.9% and 23.6% in terms of the numbers of correct and plausible patches generated by these systems. These results highlight the importance of robustness testing by considering naturalness of code transformations, which unveils true effectiveness of NPR systems. Finally, we conduct an exploration study on automating the assessment of naturalness of code transformations by deriving a new naturalness metric based on Cross-Entropy. Based on our naturalness metric, we can effectively assess naturalness for code transformations automatically with an AUC of 0.7.
- [1829] arXiv:2402.11903 (cross-list from cs.CL) [ pdf , ps , other ]
-
Title: SoLA: Solver-Layer Adaption of LLM for Better Logic ReasoningSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI)
Abstract: Considering the challenges faced by large language models (LLMs) on logical reasoning, prior efforts have sought to transform problem-solving through tool learning. While progress has been made on small-scale problems, solving industrial cases remains difficult due to their large scale and intricate expressions. In this paper, we propose a novel solver-layer adaptation (SoLA) method, where we introduce a solver as a new layer of the LLM to differentially guide solutions towards satisfiability. In SoLA, LLM aims to comprehend the search space described in natural language and identify local solutions of the highest quality, while the solver layer focuses solely on constraints not satisfied by the initial solution. Leveraging MaxSAT as a bridge, we define forward and backward transfer gradients, enabling the final model to converge to a satisfied solution or prove unsatisfiability. The backdoor theory ensures that SoLA can obtain accurate solutions within polynomial loops. We evaluate the performance of SoLA on various datasets and empirically demonstrate its consistent outperformance against existing symbolic solvers (including Z3 and Kissat) and tool-learning methods in terms of efficiency in large-scale problem-solving.
- [1830] arXiv:2402.11925 (cross-list from cs.LG) [ pdf , ps , other ]
-
Title: Energy-Efficient Edge Learning via Joint Data Deepening-and-PrefetchingComments: accepted for publication in IEEE Transactions on Wireless Communications. arXiv admin note: text overlap with arXiv:2211.07146Subjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Information Theory (cs.IT)
Abstract: The vision of pervasive artificial intelligence (AI) services can be realized by training an AI model on time using real-time data collected by internet of things (IoT) devices. To this end, IoT devices require offloading their data to an edge server in proximity. However, transmitting high-dimensional and voluminous data from energy-constrained IoT devices poses a significant challenge. To address this limitation, we propose a novel offloading architecture, called joint data deepening-and-prefetching (JD2P), which is feature-by-feature offloading comprising two key techniques. The first one is data deepening, where each data sample's features are sequentially offloaded in the order of importance determined by the data embedding technique such as principle component analysis (PCA). Offloading is terminated once the already transmitted features are sufficient for accurate data classification, resulting in a reduction in the amount of transmitted data. The criteria to offload data are derived for binary and multi-class classifiers, which are designed based on support vector machine (SVM) and deep neural network (DNN), respectively. The second one is data prefetching, where some features potentially required in the future are offloaded in advance, thus achieving high efficiency via precise prediction and parameter optimization. We evaluate the effectiveness of JD2P through experiments using the MNIST dataset, and the results demonstrate its significant reduction in expected energy consumption compared to several benchmarks without degrading learning accuracy.
- [1831] arXiv:2402.11934 (cross-list from cs.CL) [ pdf , ps , other ]
-
Title: Team QUST at SemEval-2024 Task 8: A Comprehensive Study of Monolingual and Multilingual Approaches for Detecting AI-generated TextSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI)
Abstract: This paper presents the participation of team QUST in Task 8 SemEval 2024. We first performed data augmentation and cleaning on the dataset to enhance model training efficiency and accuracy. In the monolingual task, we evaluated traditional deep-learning methods, multiscale positive-unlabeled framework (MPU), fine-tuning, adapters and ensemble methods. Then, we selected the top-performing models based on their accuracy from the monolingual models and evaluated them in subtasks A and B. The final model construction employed a stacking ensemble that combined fine-tuning with MPU. Our system achieved 8th (scored 8th in terms of accuracy, officially ranked 13th) place in the official test set in multilingual settings of subtask A. We release our system code at: this https URL
- [1832] arXiv:2402.11948 (cross-list from cs.LG) [ pdf , ps , other ]
-
Title: Mini-Hes: A Parallelizable Second-order Latent Factor Analysis ModelComments: 6 pagesSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
Abstract: Interactions among large number of entities is naturally high-dimensional and incomplete (HDI) in many big data related tasks. Behavioral characteristics of users are hidden in these interactions, hence, effective representation of the HDI data is a fundamental task for understanding user behaviors. Latent factor analysis (LFA) model has proven to be effective in representing HDI data. The performance of an LFA model relies heavily on its training process, which is a non-convex optimization. It has been proven that incorporating local curvature and preprocessing gradients during its training process can lead to superior performance compared to LFA models built with first-order family methods. However, with the escalation of data volume, the feasibility of second-order algorithms encounters challenges. To address this pivotal issue, this paper proposes a mini-block diagonal hessian-free (Mini-Hes) optimization for building an LFA model. It leverages the dominant diagonal blocks in the generalized Gauss-Newton matrix based on the analysis of the Hessian matrix of LFA model and serves as an intermediary strategy bridging the gap between first-order and second-order optimization methods. Experiment results indicate that, with Mini-Hes, the LFA model outperforms several state-of-the-art models in addressing missing data estimation task on multiple real HDI datasets from recommender system. (The source code of Mini-Hes is available at this https URL )
- [1833] arXiv:2402.11955 (cross-list from cs.CL) [ pdf , ps , other ]
-
Title: Analysis of Multidomain Abstractive Summarization Using Salience AllocationComments: 11 pages, 1 figure, 4 tablesSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI)
Abstract: This paper explores the realm of abstractive text summarization through the lens of the SEASON (Salience Allocation as Guidance for Abstractive SummarizatiON) technique, a model designed to enhance summarization by leveraging salience allocation techniques. The study evaluates SEASON's efficacy by comparing it with prominent models like BART, PEGASUS, and ProphetNet, all fine-tuned for various text summarization tasks. The assessment is conducted using diverse datasets including CNN/Dailymail, SAMSum, and Financial-news based Event-Driven Trading (EDT), with a specific focus on a financial dataset containing a substantial volume of news articles from 2020/03/01 to 2021/05/06. This paper employs various evaluation metrics such as ROUGE, METEOR, BERTScore, and MoverScore to evaluate the performance of these models fine-tuned for generating abstractive summaries. The analysis of these metrics offers a thorough insight into the strengths and weaknesses demonstrated by each model in summarizing news dataset, dialogue dataset and financial text dataset. The results presented in this paper not only contribute to the evaluation of the SEASON model's effectiveness but also illuminate the intricacies of salience allocation techniques across various types of datasets.
- [1834] arXiv:2402.11960 (cross-list from cs.LG) [ pdf , ps , other ]
-
Title: DB-LLM: Accurate Dual-Binarization for Efficient LLMsHong Chen , Chengtao Lv , Liang Ding , Haotong Qin , Xiabin Zhou , Yifu Ding , Xuebo Liu , Min Zhang , Jinyang Guo , Xianglong Liu , Dacheng TaoSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
Abstract: Large language models (LLMs) have significantly advanced the field of natural language processing, while the expensive memory and computation consumption impede their practical deployment. Quantization emerges as one of the most effective methods for improving the computational efficiency of LLMs. However, existing ultra-low-bit quantization always causes severe accuracy drops. In this paper, we empirically relieve the micro and macro characteristics of ultra-low bit quantization and present a novel Dual-Binarization method for LLMs, namely DB-LLM. For the micro-level, we take both the accuracy advantage of 2-bit-width and the efficiency advantage of binarization into account, introducing Flexible Dual Binarization (FDB). By splitting 2-bit quantized weights into two independent sets of binaries, FDB ensures the accuracy of representations and introduces flexibility, utilizing the efficient bitwise operations of binarization while retaining the inherent high sparsity of ultra-low bit quantization. For the macro-level, we find the distortion that exists in the prediction of LLM after quantization, which is specified as the deviations related to the ambiguity of samples. We propose the Deviation-Aware Distillation (DAD) method, enabling the model to focus differently on various samples. Comprehensive experiments show that our DB-LLM not only significantly surpasses the current State-of-The-Art (SoTA) in ultra-low bit quantization (eg, perplexity decreased from 9.64 to 7.23), but also achieves an additional 20\% reduction in computational consumption compared to the SOTA method under the same bit-width. Our code will be released soon.
- [1835] arXiv:2402.11963 (cross-list from cs.LG) [ pdf , ps , other ]
-
Title: Imbalance in Regression DatasetsSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Abstract: For classification, the problem of class imbalance is well known and has been extensively studied. In this paper, we argue that imbalance in regression is an equally important problem which has so far been overlooked: Due to under- and over-representations in a data set's target distribution, regressors are prone to degenerate to naive models, systematically neglecting uncommon training data and over-representing targets seen often during training. We analyse this problem theoretically and use resulting insights to develop a first definition of imbalance in regression, which we show to be a generalisation of the commonly employed imbalance measure in classification. With this, we hope to turn the spotlight on the overlooked problem of imbalance in regression and to provide common ground for future research.
- [1836] arXiv:2402.11984 (cross-list from cs.NE) [ pdf , ps , other ]
-
Title: Hebbian Learning based Orthogonal Projection for Continual Learning of Spiking Neural NetworksComments: Accepted by ICLR 2024Subjects: Neural and Evolutionary Computing (cs.NE) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: Neuromorphic computing with spiking neural networks is promising for energy-efficient artificial intelligence (AI) applications. However, different from humans who continually learn different tasks in a lifetime, neural network models suffer from catastrophic forgetting. How could neuronal operations solve this problem is an important question for AI and neuroscience. Many previous studies draw inspiration from observed neuroscience phenomena and propose episodic replay or synaptic metaplasticity, but they are not guaranteed to explicitly preserve knowledge for neuron populations. Other works focus on machine learning methods with more mathematical grounding, e.g., orthogonal projection on high dimensional spaces, but there is no neural correspondence for neuromorphic computing. In this work, we develop a new method with neuronal operations based on lateral connections and Hebbian learning, which can protect knowledge by projecting activity traces of neurons into an orthogonal subspace so that synaptic weight update will not interfere with old tasks. We show that Hebbian and anti-Hebbian learning on recurrent lateral connections can effectively extract the principal subspace of neural activities and enable orthogonal projection. This provides new insights into how neural circuits and Hebbian learning can help continual learning, and also how the concept of orthogonal projection can be realized in neuronal systems. Our method is also flexible to utilize arbitrary training methods based on presynaptic activities/traces. Experiments show that our method consistently solves forgetting for spiking neural networks with nearly zero forgetting under various supervised training methods with different error propagation approaches, and outperforms previous approaches under various settings. Our method can pave a solid path for building continual neuromorphic computing systems.
- [1837] arXiv:2402.11997 (cross-list from cs.CL) [ pdf , ps , html , other ]
-
Title: Remember This Event That Year? Assessing Temporal Information and Reasoning in Large Language ModelsSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: Large Language Models (LLMs) are increasingly becoming ubiquitous, yet their ability to reason about and retain temporal information remains limited. This hinders their application in real-world scenarios where understanding the sequential nature of events is crucial. This paper experiments with state-of-the-art models on a novel, large-scale temporal dataset, \textbf{TempUN}, to reveal significant limitations in temporal retention and reasoning abilities. Interestingly, closed-source models indicate knowledge gaps more frequently, potentially suggesting a trade-off between uncertainty awareness and incorrect responses. Further, exploring various fine-tuning approaches yielded no major performance improvements. The associated dataset and code are available at the following URL ( this https URL ).
- [1838] arXiv:2402.12008 (cross-list from cs.LG) [ pdf , ps , other ]
-
Title: Cluster Metric Sensitivity to Irrelevant FeaturesSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
Abstract: Clustering algorithms are used extensively in data analysis for data exploration and discovery. Technological advancements lead to continually growth of data in terms of volume, dimensionality and complexity. This provides great opportunities in data analytics as the data can be interrogated for many different purposes. This however leads challenges, such as identification of relevant features for a given task. In supervised tasks, one can utilise a number of methods to optimise the input features for the task objective (e.g. classification accuracy). In unsupervised problems, such tools are not readily available, in part due to an inability to quantify feature relevance in unlabeled tasks. In this paper, we investigate the sensitivity of clustering performance noisy uncorrelated variables iteratively added to baseline datasets with well defined clusters. We show how different types of irrelevant variables can impact the outcome of a clustering result from $k$-means in different ways. We observe a resilience to very high proportions of irrelevant features for adjusted rand index (ARI) and normalised mutual information (NMI) when the irrelevant features are Gaussian distributed. For Uniformly distributed irrelevant features, we notice the resilience of ARI and NMI is dependent on the dimensionality of the data and exhibits tipping points between high scores and near zero. Our results show that the Silhouette Coefficient and the Davies-Bouldin score are the most sensitive to irrelevant added features exhibiting large changes in score for comparably low proportions of irrelevant features regardless of underlying distribution or data scaling. As such the Silhouette Coefficient and the Davies-Bouldin score are good candidates for optimising feature selection in unsupervised clustering tasks.
- [1839] arXiv:2402.12010 (cross-list from cs.LG) [ pdf , ps , other ]
-
Title: Training Green AI Models Using Elite SamplesSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Neural and Evolutionary Computing (cs.NE)
Abstract: The substantial increase in AI model training has considerable environmental implications, mandating more energy-efficient and sustainable AI practices. On the one hand, data-centric approaches show great potential towards training energy-efficient AI models. On the other hand, instance selection methods demonstrate the capability of training AI models with minimised training sets and negligible performance degradation. Despite the growing interest in both topics, the impact of data-centric training set selection on energy efficiency remains to date unexplored. This paper presents an evolutionary-based sampling framework aimed at (i) identifying elite training samples tailored for datasets and model pairs, (ii) comparing model performance and energy efficiency gains against typical model training practice, and (iii) investigating the feasibility of this framework for fostering sustainable model training practices. To evaluate the proposed framework, we conducted an empirical experiment including 8 commonly used AI classification models and 25 publicly available datasets. The results showcase that by considering 10% elite training samples, the models' performance can show a 50% improvement and remarkable energy savings of 98% compared to the common training practice.
- [1840] arXiv:2402.12023 (cross-list from cs.CR) [ pdf , ps , other ]
-
Title: Evaluation of ChatGPT's Smart Contract Auditing Capabilities Based on Chain of ThoughtComments: 21 pages, 10 figuresSubjects: Cryptography and Security (cs.CR) ; Artificial Intelligence (cs.AI)
Abstract: Smart contracts, as a key component of blockchain technology, play a crucial role in ensuring the automation of transactions and adherence to protocol rules. However, smart contracts are susceptible to security vulnerabilities, which, if exploited, can lead to significant asset losses. This study explores the potential of enhancing smart contract security audits using the GPT-4 model. We utilized a dataset of 35 smart contracts from the SolidiFI-benchmark vulnerability library, containing 732 vulnerabilities, and compared it with five other vulnerability detection tools to evaluate GPT-4's ability to identify seven common types of vulnerabilities. Moreover, we assessed GPT-4's performance in code parsing and vulnerability capture by simulating a professional auditor's auditing process using CoT(Chain of Thought) prompts based on the audit reports of eight groups of smart contracts. We also evaluated GPT-4's ability to write Solidity Proof of Concepts (PoCs). Through experimentation, we found that GPT-4 performed poorly in detecting smart contract vulnerabilities, with a high Precision of 96.6%, but a low Recall of 37.8%, and an F1-score of 41.1%, indicating a tendency to miss vulnerabilities during detection. Meanwhile, it demonstrated good contract code parsing capabilities, with an average comprehensive score of 6.5, capable of identifying the background information and functional relationships of smart contracts; in 60% of the cases, it could write usable PoCs, suggesting GPT-4 has significant potential application in PoC writing. These experimental results indicate that GPT-4 lacks the ability to detect smart contract vulnerabilities effectively, but its performance in contract code parsing and PoC writing demonstrates its significant potential as an auxiliary tool in enhancing the efficiency and effectiveness of smart contract security audits.
- [1841] arXiv:2402.12026 (cross-list from cs.CL) [ pdf , ps , html , other ]
-
Title: Acquiring Clean Language Models from Backdoor Poisoned Datasets by Downscaling Frequency SpaceSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI); Cryptography and Security (cs.CR)
Abstract: Despite the notable success of language models (LMs) in various natural language processing (NLP) tasks, the reliability of LMs is susceptible to backdoor attacks. Prior research attempts to mitigate backdoor learning while training the LMs on the poisoned dataset, yet struggles against complex backdoor attacks in real-world scenarios. In this paper, we investigate the learning mechanisms of backdoor LMs in the frequency space by Fourier analysis. Our findings indicate that the backdoor mapping presented on the poisoned datasets exhibits a more discernible inclination towards lower frequency compared to clean mapping, resulting in the faster convergence of backdoor mapping. To alleviate this dilemma, we propose Multi-Scale Low-Rank Adaptation (MuScleLoRA), which deploys multiple radial scalings in the frequency space with low-rank adaptation to the target model and further aligns the gradients when updating parameters. Through downscaling in the frequency space, MuScleLoRA encourages the model to prioritize the learning of relatively high-frequency clean mapping, consequently mitigating backdoor learning. Experimental results demonstrate that MuScleLoRA outperforms baselines significantly. Notably, MuScleLoRA reduces the average success rate of diverse backdoor attacks to below 15\% across multiple datasets and generalizes to various backbone LMs, including BERT, RoBERTa, and Llama2. The codes are available at this https URL .
- [1842] arXiv:2402.12035 (cross-list from cs.LG) [ pdf , ps , other ]
-
Title: Class-incremental Learning for Time Series: Benchmark and EvaluationZhongzheng Qiao , Quang Pham , Zhen Cao , Hoang H Le , P.N.Suganthan , Xudong Jiang , Ramasamy SavithaComments: Currently under review for KDD 2024 (ADS track)Subjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Abstract: Real-world environments are inherently non-stationary, frequently introducing new classes over time. This is especially common in time series classification, such as the emergence of new disease classification in healthcare or the addition of new activities in human activity recognition. In such cases, a learning system is required to assimilate novel classes effectively while avoiding catastrophic forgetting of the old ones, which gives rise to the Class-incremental Learning (CIL) problem. However, despite the encouraging progress in the image and language domains, CIL for time series data remains relatively understudied. Existing studies suffer from inconsistent experimental designs, necessitating a comprehensive evaluation and benchmarking of methods across a wide range of datasets. To this end, we first present an overview of the Time Series Class-incremental Learning (TSCIL) problem, highlight its unique challenges, and cover the advanced methodologies. Further, based on standardized settings, we develop a unified experimental framework that supports the rapid development of new algorithms, easy integration of new datasets, and standardization of the evaluation process. Using this framework, we conduct a comprehensive evaluation of various generic and time-series-specific CIL methods in both standard and privacy-sensitive scenarios. Our extensive experiments not only provide a standard baseline to support future research but also shed light on the impact of various design factors such as normalization layers or memory budget thresholds. Codes are available at this https URL .
- [1843] arXiv:2402.12042 (cross-list from cs.LG) [ pdf , ps , other ]
-
Title: Linear bandits with polylogarithmic minimax regretComments: 39 pages, 3 figuresSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
Abstract: We study a noise model for linear stochastic bandits for which the subgaussian noise parameter vanishes linearly as we select actions on the unit sphere closer and closer to the unknown vector. We introduce an algorithm for this problem that exhibits a minimax regret scaling as $\log^3(T)$ in the time horizon $T$, in stark contrast the square root scaling of this regret for typical bandit algorithms. Our strategy, based on weighted least-squares estimation, achieves the eigenvalue relation $\lambda_{\min} ( V_t ) = \Omega (\sqrt{\lambda_{\max}(V_t ) })$ for the design matrix $V_t$ at each time step $t$ through geometrical arguments that are independent of the noise model and might be of independent interest. This allows us to tightly control the expected regret in each time step to be of the order $O(\frac1{t})$, leading to the logarithmic scaling of the cumulative regret.
- [1844] arXiv:2402.12061 (cross-list from cs.LG) [ pdf , ps , other ]
-
Title: All Language Models Large and SmallSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
Abstract: Many leading language models (LMs) use high-intensity computational resources both during training and execution. This poses the challenge of lowering resource costs for deployment and faster execution of decision-making tasks among others. We introduce a novel plug-and-play LM framework named Language Optimising Network Distribution (LONDI) framework. LONDI learns to selectively employ large LMs only where complex decision-making and reasoning are required while using low-resource LMs everywhere else. LONDI consists of a system of two (off-)policy networks, an LM, a large LM (LLM), and a reinforcement learning module that uses switching controls to quickly learn which system states to call the LLM. We then introduce a variant of LONDI that maintains budget constraints on LLM calls and hence its resource usage. Theoretically, we prove LONDI learns the subset of system states to activate the LLM required to solve the task. We then prove that LONDI converges to optimal solutions while also preserving budgetary constraints on LLM calls almost surely enabling it to solve various tasks while significantly lowering computational costs. We test LONDI's performance in a range of tasks in ScienceWorld and BabyAI-Text and demonstrate that LONDI can solve tasks only solvable by resource-intensive LLMs while reducing GPU usage by up to 30%.
- [1845] arXiv:2402.12062 (cross-list from cs.CY) [ pdf , ps , html , other ]
-
Title: Causal Equal Protection as Algorithmic FairnessComments: 18 pages, 5 figuresSubjects: Computers and Society (cs.CY) ; Artificial Intelligence (cs.AI); Data Structures and Algorithms (cs.DS); Machine Learning (cs.LG)
Abstract: Over the last ten years the literature in computer science and philosophy has formulated different criteria of algorithmic fairness. One of the most discussed, classification parity, requires that the erroneous classifications of a predictive algorithm occur with equal frequency for groups picked out by protected characteristics. Despite its intuitive appeal, classification parity has come under attack. Multiple scenarios can be imagined in which - intuitively - a predictive algorithm does not treat any individual unfairly, and yet classification parity is violated. To make progress, we turn to a related principle, equal protection, originally developed in the context of criminal justice. Key to equal protection is equalizing the risks of erroneous classifications (in a sense to be specified) as opposed to equalizing the rates of erroneous classifications. We show that equal protection avoids many of the counterexamples to classification parity, but also fails to model our moral intuitions in a number of common scenarios, for example, when the predictor is causally downstream relative to the protected characteristic. To address these difficulties, we defend a novel principle, causal equal protection, that models the fair allocation of the risks of erroneous classification through the lenses of causality.
- [1846] arXiv:2402.12065 (cross-list from cs.LG) [ pdf , ps , html , other ]
-
Title: WKVQuant: Quantizing Weight and Key/Value Cache for Large Language Models Gains MoreComments: Frist work to exclusively quantize weight and Key/Value cache for large language modelsSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
Abstract: Large Language Models (LLMs) face significant deployment challenges due to their substantial memory requirements and the computational demands of auto-regressive text generation process. This paper addresses these challenges by focusing on the quantization of LLMs, a technique that reduces memory consumption by converting model parameters and activations into low-bit integers. We critically analyze the existing quantization approaches, identifying their limitations in balancing the accuracy and efficiency of the quantized LLMs. To advance beyond these limitations, we propose WKVQuant, a PTQ framework especially designed for quantizing weights and the key/value (KV) cache of LLMs. Specifically, we incorporates past-only quantization to improve the computation of attention. Additionally, we introduce two-dimensional quantization strategy to handle the distribution of KV cache, along with a cross-block reconstruction regularization for parameter optimization. Experiments show that WKVQuant achieves almost comparable memory savings to weight-activation quantization, while also approaching the performance of weight-only quantization.
- [1847] arXiv:2402.12071 (cross-list from cs.CL) [ pdf , ps , other ]
-
Title: EmoBench: Evaluating the Emotional Intelligence of Large Language ModelsSahand Sabour , Siyang Liu , Zheyuan Zhang , June M. Liu , Jinfeng Zhou , Alvionna S. Sunaryo , Juanzi Li , Tatia M.C. Lee , Rada Mihalcea , Minlie HuangComments: Work in progressSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI)
Abstract: Recent advances in Large Language Models (LLMs) have highlighted the need for robust, comprehensive, and challenging benchmarks. Yet, research on evaluating their Emotional Intelligence (EI) is considerably limited. Existing benchmarks have two major shortcomings: first, they mainly focus on emotion recognition, neglecting essential EI capabilities such as emotion regulation and thought facilitation through emotion understanding; second, they are primarily constructed from existing datasets, which include frequent patterns, explicit information, and annotation errors, leading to unreliable evaluation. We propose EmoBench, a benchmark that draws upon established psychological theories and proposes a comprehensive definition for machine EI, including Emotional Understanding and Emotional Application. EmoBench includes a set of 400 hand-crafted questions in English and Chinese, which are meticulously designed to require thorough reasoning and understanding. Our findings reveal a considerable gap between the EI of existing LLMs and the average human, highlighting a promising direction for future research. Our code and data will be publicly available from this https URL .
- [1848] arXiv:2402.12091 (cross-list from cs.CL) [ pdf , ps , other ]
-
Title: Do Large Language Models Understand Logic or Just Mimick Context?Subjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI)
Abstract: Over the past few years, the abilities of large language models (LLMs) have received extensive attention, which have performed exceptionally well in complicated scenarios such as logical reasoning and symbolic inference. A significant factor contributing to this progress is the benefit of in-context learning and few-shot prompting. However, the reasons behind the success of such models using contextual reasoning have not been fully explored. Do LLMs have understand logical rules to draw inferences, or do they ``guess'' the answers by learning a type of probabilistic mapping through context? This paper investigates the reasoning capabilities of LLMs on two logical reasoning datasets by using counterfactual methods to replace context text and modify logical concepts. Based on our analysis, it is found that LLMs do not truly understand logical rules; rather, in-context learning has simply enhanced the likelihood of these models arriving at the correct answers. If one alters certain words in the context text or changes the concepts of logical terms, the outputs of LLMs can be significantly disrupted, leading to counter-intuitive responses. This work provides critical insights into the limitations of LLMs, underscoring the need for more robust mechanisms to ensure reliable logical reasoning in LLMs.
- [1849] arXiv:2402.12098 (cross-list from cs.CV) [ pdf , ps , other ]
-
Title: Towards Explainable LiDAR Point Cloud Semantic Segmentation via Gradient Based Target LocalizationSubjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI)
Abstract: Semantic Segmentation (SS) of LiDAR point clouds is essential for many applications, such as urban planning and autonomous driving. While much progress has been made in interpreting SS predictions for images, interpreting point cloud SS predictions remains a challenge. This paper introduces pGS-CAM, a novel gradient-based method for generating saliency maps in neural network activation layers. Inspired by Grad-CAM, which uses gradients to highlight local importance, pGS-CAM is robust and effective on a variety of datasets (SemanticKITTI, Paris-Lille3D, DALES) and 3D deep learning architectures (KPConv, RandLANet). Our experiments show that pGS-CAM effectively accentuates the feature learning in intermediate activations of SS architectures by highlighting the contribution of each point. This allows us to better understand how SS models make their predictions and identify potential areas for improvement. Relevant codes are available at this https URL .
- [1850] arXiv:2402.12100 (cross-list from cs.CL) [ pdf , ps , html , other ]
-
Title: Groot: Adversarial Testing for Generative Text-to-Image Models with Tree-based Semantic TransformationSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI); Cryptography and Security (cs.CR); Software Engineering (cs.SE)
Abstract: With the prevalence of text-to-image generative models, their safety becomes a critical concern. adversarial testing techniques have been developed to probe whether such models can be prompted to produce Not-Safe-For-Work (NSFW) content. However, existing solutions face several challenges, including low success rate and inefficiency. We introduce Groot, the first automated framework leveraging tree-based semantic transformation for adversarial testing of text-to-image models. Groot employs semantic decomposition and sensitive element drowning strategies in conjunction with LLMs to systematically refine adversarial prompts. Our comprehensive evaluation confirms the efficacy of Groot, which not only exceeds the performance of current state-of-the-art approaches but also achieves a remarkable success rate (93.66%) on leading text-to-image models such as DALL-E 3 and Midjourney.
- [1851] arXiv:2402.12102 (cross-list from cs.CL) [ pdf , ps , other ]
-
Title: Is It a Free Lunch for Removing Outliers during Pretraining?Comments: 5 pages, 3 figures, 1 tableSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI)
Abstract: With the growing size of large language models, the role of quantization becomes increasingly significant. However, outliers present in weights or activations notably influence the performance of quantized models. Recently, \citet{qtransformer} introduced a novel softmax function aimed at pretraining models in an outlier-free manner, thereby enhancing their suitability for quantization. Interestingly, we observed that such an approach leads to performance degradation in full precision. Building on this insight, we enhance the method by ensuring its normalization is invariant to sequence length, a crucial factor for bridging the gap between pretraining and fine-tuning. Moreover, this improved method also facilitates successful pretraining of causal language models.
- [1852] arXiv:2402.12118 (cross-list from cs.LG) [ pdf , ps , other ]
-
Title: DualView: Data Attribution from the Dual PerspectiveSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Abstract: Local data attribution (or influence estimation) techniques aim at estimating the impact that individual data points seen during training have on particular predictions of an already trained Machine Learning model during test time. Previous methods either do not perform well consistently across different evaluation criteria from literature, are characterized by a high computational demand, or suffer from both. In this work we present DualView, a novel method for post-hoc data attribution based on surrogate modelling, demonstrating both high computational efficiency, as well as good evaluation results. With a focus on neural networks, we evaluate our proposed technique using suitable quantitative evaluation strategies from the literature against related principal local data attribution methods. We find that DualView requires considerably lower computational resources than other methods, while demonstrating comparable performance to competing approaches across evaluation metrics. Futhermore, our proposed method produces sparse explanations, where sparseness can be tuned via a hyperparameter. Finally, we showcase that with DualView, we can now render explanations from local data attributions compatible with established local feature attribution methods: For each prediction on (test) data points explained in terms of impactful samples from the training set, we are able to compute and visualize how the prediction on (test) sample relates to each influential training sample in terms of features recognized and by the model. We provide an Open Source implementation of DualView online, together with implementations for all other local data attribution methods we compare against, as well as the metrics reported here, for full reproducibility.
- [1853] arXiv:2402.12121 (cross-list from cs.CL) [ pdf , ps , other ]
-
Title: Evaluating Image Review Ability of Vision Language ModelsShigeki Saito , Kazuki Hayashi , Yusuke Ide , Yusuke Sakai , Kazuma Onishi , Toma Suzuki , Seiji Gobara , Hidetaka Kamigaito , Katsuhiko Hayashi , Taro WatanabeComments: 9pages, under reviewingSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM)
Abstract: Large-scale vision language models (LVLMs) are language models that are capable of processing images and text inputs by a single model. This paper explores the use of LVLMs to generate review texts for images. The ability of LVLMs to review images is not fully understood, highlighting the need for a methodical evaluation of their review abilities. Unlike image captions, review texts can be written from various perspectives such as image composition and exposure. This diversity of review perspectives makes it difficult to uniquely determine a single correct review for an image. To address this challenge, we introduce an evaluation method based on rank correlation analysis, in which review texts are ranked by humans and LVLMs, then, measures the correlation between these rankings. We further validate this approach by creating a benchmark dataset aimed at assessing the image review ability of recent LVLMs. Our experiments with the dataset reveal that LVLMs, particularly those with proven superiority in other evaluative contexts, excel at distinguishing between high-quality and substandard image reviews.
- [1854] arXiv:2402.12146 (cross-list from cs.CL) [ pdf , ps , other ]
-
Title: Meta Ranking: Less Capable Language Models are Capable for Single Response JudgementComments: Preprint, under review. 25 pagesSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: Although Large Language Models (LLMs) have demonstrated strong performance on a wide range of tasks, they still face reliability challenges such as hallucination. Previous studies reveal that highly capable LLMs like GPT-4 are effective in judging the reliability of individual responses, while less capable ones are often tuned to evaluate the relative reliability of responses to the same query. To enable less capable LLMs to effectively judge the reliability of individual responses, we propose a novel method named $\textit{Meta}$ $\textit{Ranking}$ (MR). Unlike previous methods, which assess the response directly, we achieve the judgement by comparing the target query-response pair with reference query-response pairs. We found its remarkable effectiveness in error detection for LLM responses on reasoning tasks, where less capable LLMs could outperform strong baselines, even without fine-tuning. We further demonstrate that MR can be used to enhance the performance of LLMs in two practical applications: query routing and iterative training data filtering. The former achieves GPT-4-turbo comparable performance with less than half the token consumption, while the latter makes the instruction-tuned LLaMA-7B and Phi-2, a 2.7B model, significantly surpass Alpaca-13B over fewer training samples, underscoring the high potential of our proposed method.
- [1855] arXiv:2402.12147 (cross-list from cs.CL) [ pdf , ps , html , other ]
-
Title: Surprising Efficacy of Fine-Tuned Transformers for Fact-Checking over Larger Language ModelsComments: Accepted in SIGIR 2024 (industry track)Subjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI)
Abstract: In this paper, we explore the challenges associated with establishing an end-to-end fact-checking pipeline in a real-world context, covering over 90 languages. Our real-world experimental benchmarks demonstrate that fine-tuning Transformer models specifically for fact-checking tasks, such as claim detection and veracity prediction, provide superior performance over large language models (LLMs) like GPT-4, GPT-3.5-Turbo, and Mistral-7b. However, we illustrate that LLMs excel in generative tasks such as question decomposition for evidence retrieval. Through extensive evaluation, we show the efficacy of fine-tuned models for fact-checking in a multilingual setting and complex claims that include numerical quantities.
- [1856] arXiv:2402.12150 (cross-list from cs.CL) [ pdf , ps , html , other ]
-
Title: Your Large Language Model is Secretly a Fairness Proponent and You Should Prompt it Like OneSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI)
Abstract: The widespread adoption of large language models (LLMs) underscores the urgent need to ensure their fairness. However, LLMs frequently present dominant viewpoints while ignoring alternative perspectives from minority parties, resulting in potential biases. We hypothesize that these fairness-violating behaviors occur because LLMs express their viewpoints using a human personality that represents the majority of training data. In response to this, we validate that prompting LLMs with specific roles can allow LLMs to express diverse viewpoints. Building on this insight and observation, we develop FairThinking, a pipeline designed to automatically generate roles that enable LLMs to articulate diverse perspectives for fair expressions. To evaluate FairThinking, we create a dataset with a thousand items covering three fairness-related topics and conduct experiments on GPT-3.5, GPT-4, Llama2, and Mistral to demonstrate its superior performance.
- [1857] arXiv:2402.12151 (cross-list from cs.CL) [ pdf , ps , html , other ]
-
Title: Transformer-based Causal Language Models Perform ClusteringComments: Added new experimental results and fixed some errorsSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI)
Abstract: Even though large language models (LLMs) have demonstrated remarkable capability in solving various natural language tasks, the capability of an LLM to follow human instructions is still a concern. Recent works have shown great improvements in the instruction-following capability via additional training for instruction-following tasks. However, the mechanisms responsible for effective instruction-following capabilities remain inadequately understood. Here, we introduce a simplified instruction-following task and use synthetic datasets to analyze a Transformer-based causal language model. Our findings suggest that the model learns task-specific information by clustering data within its hidden space, with this clustering process evolving dynamically during learning. We also demonstrate how this phenomenon assists the model in handling unseen instances, and validate our results in a more realistic setting. Furthermore, we present inspired applications regarding pre-training and alignment.
- [1858] arXiv:2402.12161 (cross-list from cs.LG) [ pdf , ps , html , other ]
-
Title: Endowing Pre-trained Graph Models with Provable FairnessComments: Accepted by WWW 2024Subjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Computers and Society (cs.CY); Social and Information Networks (cs.SI)
Abstract: Pre-trained graph models (PGMs) aim to capture transferable inherent structural properties and apply them to different downstream tasks. Similar to pre-trained language models, PGMs also inherit biases from human society, resulting in discriminatory behavior in downstream applications. The debiasing process of existing fair methods is generally coupled with parameter optimization of GNNs. However, different downstream tasks may be associated with different sensitive attributes in reality, directly employing existing methods to improve the fairness of PGMs is inflexible and inefficient. Moreover, most of them lack a theoretical guarantee, i.e., provable lower bounds on the fairness of model predictions, which directly provides assurance in a practical scenario. To overcome these limitations, we propose a novel adapter-tuning framework that endows pre-trained graph models with provable fairness (called GraphPAR). GraphPAR freezes the parameters of PGMs and trains a parameter-efficient adapter to flexibly improve the fairness of PGMs in downstream tasks. Specifically, we design a sensitive semantic augmenter on node representations, to extend the node representations with different sensitive attribute semantics for each node. The extended representations will be used to further train an adapter, to prevent the propagation of sensitive attribute semantics from PGMs to task predictions. Furthermore, with GraphPAR, we quantify whether the fairness of each node is provable, i.e., predictions are always fair within a certain range of sensitive attribute semantics. Experimental evaluations on real-world datasets demonstrate that GraphPAR achieves state-of-the-art prediction performance and fairness on node classification task. Furthermore, based on our GraphPAR, around 90\% nodes have provable fairness.
- [1859] arXiv:2402.12168 (cross-list from cs.CR) [ pdf , ps , html , other ]
-
Title: Defending Against Weight-Poisoning Backdoor Attacks for Parameter-Efficient Fine-TuningComments: NAACL Findings 2024Subjects: Cryptography and Security (cs.CR) ; Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
Abstract: Recently, various parameter-efficient fine-tuning (PEFT) strategies for application to language models have been proposed and successfully implemented. However, this raises the question of whether PEFT, which only updates a limited set of model parameters, constitutes security vulnerabilities when confronted with weight-poisoning backdoor attacks. In this study, we show that PEFT is more susceptible to weight-poisoning backdoor attacks compared to the full-parameter fine-tuning method, with pre-defined triggers remaining exploitable and pre-defined targets maintaining high confidence, even after fine-tuning. Motivated by this insight, we developed a Poisoned Sample Identification Module (PSIM) leveraging PEFT, which identifies poisoned samples through confidence, providing robust defense against weight-poisoning backdoor attacks. Specifically, we leverage PEFT to train the PSIM with randomly reset sample labels. During the inference process, extreme confidence serves as an indicator for poisoned samples, while others are clean. We conduct experiments on text classification tasks, five fine-tuning strategies, and three weight-poisoning backdoor attack methods. Experiments show near 100% success rates for weight-poisoning backdoor attacks when utilizing PEFT. Furthermore, our defensive approach exhibits overall competitive performance in mitigating weight-poisoning backdoor attacks.
- [1860] arXiv:2402.12170 (cross-list from cs.CL) [ pdf , ps , other ]
-
Title: Unsupervised LLM Adaptation for Question AnsweringSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI)
Abstract: Large language models (LLM) learn diverse knowledge present in the large-scale training dataset via self-supervised training. Followed by instruction-tuning, LLM acquires the ability to return correct information for diverse questions. However, adapting these pre-trained LLMs to new target domains, such as different organizations or periods, for the question-answering (QA) task incurs a substantial annotation cost. To tackle this challenge, we propose a novel task, unsupervised LLM adaptation for question answering. In this task, we leverage a pre-trained LLM, a publicly available QA dataset (source data), and unlabeled documents from the target domain. Our goal is to learn LLM that can answer questions about the target domain. We introduce one synthetic and two real datasets to evaluate models fine-tuned on the source and target data, and reveal intriguing insights; (i) fine-tuned models exhibit the ability to provide correct answers for questions about the target domain even though they do not see any questions about the information described in the unlabeled documents, but (ii) they have difficulties in accessing information located in the middle or at the end of documents, and (iii) this challenge can be partially mitigated by replacing input tokens with random ones during adaptation.
- [1861] arXiv:2402.12177 (cross-list from cs.LG) [ pdf , ps , html , other ]
-
Title: Mafin: Enhancing Black-Box Embeddings with Model Augmented Fine-TuningSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
Abstract: Retrieval Augmented Generation (RAG) has emerged as an effective solution for mitigating hallucinations in Large Language Models (LLMs). The retrieval stage in RAG typically involves a pre-trained embedding model, which converts queries and passages into vectors to capture their semantics. However, a standard pre-trained embedding model may exhibit sub-optimal performance when applied to specific domain knowledge, necessitating fine-tuning. This paper addresses scenarios where the embeddings are only available from a black-box model. We introduce Model augmented fine-tuning (Mafin) -- a novel approach for fine-tuning a black-box embedding model by augmenting it with a trainable embedding model. Our results demonstrate that Mafin significantly enhances the performance of the black-box embeddings by only requiring the training of a small augmented model. We validate the effectiveness of our method on both labeled and unlabeled datasets, illustrating its broad applicability and efficiency.
- [1862] arXiv:2402.12179 (cross-list from cs.CV) [ pdf , ps , other ]
-
Title: Examining Monitoring System: Detecting Abnormal Behavior In Online ExaminationsDinh An Ngo , Thanh Dat Nguyen , Thi Le Chi Dang , Huy Hoan Le , Ton Bao Ho , Vo Thanh Khang Nguyen , Truong Thanh Hung NguyenSubjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI); Computers and Society (cs.CY)
Abstract: Cheating in online exams has become a prevalent issue over the past decade, especially during the COVID-19 pandemic. To address this issue of academic dishonesty, our "Exam Monitoring System: Detecting Abnormal Behavior in Online Examinations" is designed to assist proctors in identifying unusual student behavior. Our system demonstrates high accuracy and speed in detecting cheating in real-time scenarios, providing valuable information, and aiding proctors in decision-making. This article outlines our methodology and the effectiveness of our system in mitigating the widespread problem of cheating in online exams.
- [1863] arXiv:2402.12181 (cross-list from cs.LG) [ pdf , ps , other ]
-
Title: Revisiting Data Augmentation in Deep Reinforcement LearningComments: Accepted in ICLR 2024Subjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
Abstract: Various data augmentation techniques have been recently proposed in image-based deep reinforcement learning (DRL). Although they empirically demonstrate the effectiveness of data augmentation for improving sample efficiency or generalization, which technique should be preferred is not always clear. To tackle this question, we analyze existing methods to better understand them and to uncover how they are connected. Notably, by expressing the variance of the Q-targets and that of the empirical actor/critic losses of these methods, we can analyze the effects of their different components and compare them. We furthermore formulate an explanation about how these methods may be affected by choosing different data augmentation transformations in calculating the target Q-values. This analysis suggests recommendations on how to exploit data augmentation in a more principled way. In addition, we include a regularization term called tangent prop, previously proposed in computer vision, but whose adaptation to DRL is novel to the best of our knowledge. We evaluate our proposition and validate our analysis in several domains. Compared to different relevant baselines, we demonstrate that it achieves state-of-the-art performance in most environments and shows higher sample efficiency and better generalization ability in some complex environments.
- [1864] arXiv:2402.12202 (cross-list from cs.IR) [ pdf , ps , other ]
-
Title: Heterogeneity-aware Cross-school Electives Recommendation: a Hybrid Federated ApproachJournal-ref: 2023 IEEE International Conference on Data Mining Workshops (ICDMW)Subjects: Information Retrieval (cs.IR) ; Artificial Intelligence (cs.AI)
Abstract: In the era of modern education, addressing cross-school learner diversity is crucial, especially in personalized recommender systems for elective course selection. However, privacy concerns often limit cross-school data sharing, which hinders existing methods' ability to model sparse data and address heterogeneity effectively, ultimately leading to suboptimal recommendations. In response, we propose HFRec, a heterogeneity-aware hybrid federated recommender system designed for cross-school elective course recommendations. The proposed model constructs heterogeneous graphs for each school, incorporating various interactions and historical behaviors between students to integrate context and content information. We design an attention mechanism to capture heterogeneity-aware representations. Moreover, under a federated scheme, we train individual school-based models with adaptive learning settings to recommend tailored electives. Our HFRec model demonstrates its effectiveness in providing personalized elective recommendations while maintaining privacy, as it outperforms state-of-the-art models on both open-source and real-world datasets.
- [1865] arXiv:2402.12216 (cross-list from cs.CY) [ pdf , ps , html , other ]
-
Title: Copyleft for Alleviating AIGC Copyright Dilemma: What-if Analysis, Public Perception and ImplicationsComments: 9 pages, 8 figuresSubjects: Computers and Society (cs.CY) ; Artificial Intelligence (cs.AI)
Abstract: As AIGC has impacted our society profoundly in the past years, ethical issues have received tremendous attention. The most urgent one is the AIGC copyright dilemma, which can immensely stifle the development of AIGC and greatly cost the entire society. Given the complexity of AIGC copyright governance and the fact that no perfect solution currently exists, previous work advocated copyleft on AI governance but without substantive analysis. In this paper, we take a step further to explore the feasibility of copyleft to alleviate the AIGC copyright dilemma. We conduct a mixed-methods study from two aspects: qualitatively, we use a formal what-if analysis to clarify the dilemma and provide case studies to show the feasibility of copyleft; quantitatively, we perform a carefully designed survey to find out how the public feels about copylefting AIGC. The key findings include: a) people generally perceive the dilemma, b) they prefer to use authorized AIGC under loose restriction, and c) they are positive to copyleft in AIGC and willing to use it in the future.
- [1866] arXiv:2402.12219 (cross-list from cs.CL) [ pdf , ps , html , other ]
-
Title: Reformatted AlignmentRun-Ze Fan , Xuefeng Li , Haoyang Zou , Junlong Li , Shwai He , Ethan Chern , Jiewen Hu , Pengfei LiuComments: Homepage: this https URLSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: The quality of finetuning data is crucial for aligning large language models (LLMs) with human values. Current methods to improve data quality are either labor-intensive or prone to factual errors caused by LLM hallucinations. This paper explores elevating the quality of existing instruction data to better align with human values, introducing a simple and effective approach named ReAlign, which reformats the responses of instruction data into a format that better aligns with pre-established criteria and the collated evidence. This approach minimizes human annotation, hallucination, and the difficulty in scaling, remaining orthogonal to existing alignment techniques. Experimentally, ReAlign significantly boosts the general alignment ability, math reasoning, factuality, and readability of the LLMs.
Encouragingly, without introducing any additional data or advanced training techniques, and merely by reformatting the response, LLaMA-2-13B's mathematical reasoning ability on GSM8K can be improved from 46.77% to 56.63% in accuracy. Additionally, a mere 5% of ReAlign data yields a 67% boost in general alignment ability measured by the Alpaca dataset. This work highlights the need for further research into the science and mechanistic interpretability of LLMs. We have made the associated code and data publicly accessible to support future studies at this https URL . - [1867] arXiv:2402.12226 (cross-list from cs.CL) [ pdf , ps , html , other ]
-
Title: AnyGPT: Unified Multimodal LLM with Discrete Sequence ModelingJun Zhan , Junqi Dai , Jiasheng Ye , Yunhua Zhou , Dong Zhang , Zhigeng Liu , Xin Zhang , Ruibin Yuan , Ge Zhang , Linyang Li , Hang Yan , Jie Fu , Tao Gui , Tianxiang Sun , Yugang Jiang , Xipeng QiuComments: 28 pages, 16 figures, under review, work in progressSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Abstract: We introduce AnyGPT, an any-to-any multimodal language model that utilizes discrete representations for the unified processing of various modalities, including speech, text, images, and music. AnyGPT can be trained stably without any alterations to the current large language model (LLM) architecture or training paradigms. Instead, it relies exclusively on data-level preprocessing, facilitating the seamless integration of new modalities into LLMs, akin to the incorporation of new languages. We build a multimodal text-centric dataset for multimodal alignment pre-training. Utilizing generative models, we synthesize the first large-scale any-to-any multimodal instruction dataset. It consists of 108k samples of multi-turn conversations that intricately interweave various modalities, thus equipping the model to handle arbitrary combinations of multimodal inputs and outputs. Experimental results demonstrate that AnyGPT is capable of facilitating any-to-any multimodal conversation while achieving performance comparable to specialized models across all modalities, proving that discrete representations can effectively and conveniently unify multiple modalities within a language model. Demos are shown in this https URL
- [1868] arXiv:2402.12232 (cross-list from stat.ML) [ pdf , ps , other ]
-
Title: Kernel KMeans clustering splits for end-to-end unsupervised decision treesSubjects: Machine Learning (stat.ML) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: Trees are convenient models for obtaining explainable predictions on relatively small datasets. Although there are many proposals for the end-to-end construction of such trees in supervised learning, learning a tree end-to-end for clustering without labels remains an open challenge. As most works focus on interpreting with trees the result of another clustering algorithm, we present here a novel end-to-end trained unsupervised binary tree for clustering: Kauri. This method performs a greedy maximisation of the kernel KMeans objective without requiring the definition of centroids. We compare this model on multiple datasets with recent unsupervised trees and show that Kauri performs identically when using a linear kernel. For other kernels, Kauri often outperforms the concatenation of kernel KMeans and a CART decision tree.
- [1869] arXiv:2402.12237 (cross-list from cs.LG) [ pdf , ps , html , other ]
-
Title: Learning to Defer in Content Moderation: The Human-AI InterplaySubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Computer Science and Game Theory (cs.GT); Human-Computer Interaction (cs.HC); Performance (cs.PF)
Abstract: Successful content moderation in online platforms relies on a human-AI collaboration approach. A typical heuristic estimates the expected harmfulness of a post and uses fixed thresholds to decide whether to remove it and whether to send it for human review. This disregards the prediction uncertainty, the time-varying element of human review capacity and post arrivals, and the selective sampling in the dataset (humans only review posts filtered by the admission algorithm).
In this paper, we introduce a model to capture the human-AI interplay in content moderation. The algorithm observes contextual information for incoming posts, makes classification and admission decisions, and schedules posts for human review. Only admitted posts receive human reviews on their harmfulness. These reviews help educate the machine-learning algorithms but are delayed due to congestion in the human review system. The classical learning-theoretic way to capture this human-AI interplay is via the framework of learning to defer, where the algorithm has the option to defer a classification task to humans for a fixed cost and immediately receive feedback. Our model contributes to this literature by introducing congestion in the human review system. Moreover, unlike work on online learning with delayed feedback where the delay in the feedback is exogenous to the algorithm's decisions, the delay in our model is endogenous to both the admission and the scheduling decisions.
We propose a near-optimal learning algorithm that carefully balances the classification loss from a selectively sampled dataset, the idiosyncratic loss of non-reviewed posts, and the delay loss of having congestion in the human review system. To the best of our knowledge, this is the first result for online learning in contextual queueing systems and hence our analytical framework may be of independent interest. - [1870] arXiv:2402.12240 (cross-list from cs.LG) [ pdf , ps , other ]
-
Title: BEARS Make Neuro-Symbolic Models Aware of their Reasoning ShortcutsEmanuele Marconato , Samuele Bortolotti , Emile van Krieken , Antonio Vergari , Andrea Passerini , Stefano TesoSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Abstract: Neuro-Symbolic (NeSy) predictors that conform to symbolic knowledge - encoding, e.g., safety constraints - can be affected by Reasoning Shortcuts (RSs): They learn concepts consistent with the symbolic knowledge by exploiting unintended semantics. RSs compromise reliability and generalization and, as we show in this paper, they are linked to NeSy models being overconfident about the predicted concepts. Unfortunately, the only trustworthy mitigation strategy requires collecting costly dense supervision over the concepts. Rather than attempting to avoid RSs altogether, we propose to ensure NeSy models are aware of the semantic ambiguity of the concepts they learn, thus enabling their users to identify and distrust low-quality concepts. Starting from three simple desiderata, we derive bears (BE Aware of Reasoning Shortcuts), an ensembling technique that calibrates the model's concept-level confidence without compromising prediction accuracy, thus encouraging NeSy architectures to be uncertain about concepts affected by RSs. We show empirically that bears improves RS-awareness of several state-of-the-art NeSy models, and also facilitates acquiring informative dense annotations for mitigation purposes.
- [1871] arXiv:2402.12264 (cross-list from cs.LG) [ pdf , ps , html , other ]
-
Title: Uncertainty quantification in fine-tuned LLMs using LoRA ensemblesComments: 8 pages, 4 figuresSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (stat.ML)
Abstract: Fine-tuning large language models can improve task specific performance, although a general understanding of what the fine-tuned model has learned, forgotten and how to trust its predictions is still missing. We derive principled uncertainty quantification for fine-tuned LLMs with posterior approximations using computationally efficient low-rank adaptation ensembles. We analyze three common multiple-choice datasets using low-rank adaptation ensembles based on Mistral-7b, and draw quantitative and qualitative conclusions on their perceived complexity and model efficacy on the different target domains during and after fine-tuning. In particular, backed by the numerical experiments, we hypothesise about signals from entropic uncertainty measures for data domains that are inherently difficult for a given architecture to learn.
- [1872] arXiv:2402.12265 (cross-list from cs.LG) [ pdf , ps , other ]
-
Title: On the Byzantine-Resilience of Distillation-Based Federated LearningSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Distributed, Parallel, and Cluster Computing (cs.DC)
Abstract: Federated Learning (FL) algorithms using Knowledge Distillation (KD) have received increasing attention due to their favorable properties with respect to privacy, non-i.i.d. data and communication cost. These methods depart from transmitting model parameters and, instead, communicate information about a learning task by sharing predictions on a public dataset. In this work, we study the performance of such approaches in the byzantine setting, where a subset of the clients act in an adversarial manner aiming to disrupt the learning process. We show that KD-based FL algorithms are remarkably resilient and analyze how byzantine clients can influence the learning process compared to Federated Averaging. Based on these insights, we introduce two new byzantine attacks and demonstrate that they are effective against prior byzantine-resilient methods. Additionally, we propose FilterExp, a novel method designed to enhance the byzantine resilience of KD-based FL algorithms and demonstrate its efficacy. Finally, we provide a general method to make attacks harder to detect, improving their effectiveness.
- [1873] arXiv:2402.12279 (cross-list from cs.CL) [ pdf , ps , html , other ]
-
Title: Key ingredients for effective zero-shot cross-lingual knowledge transfer in generative tasksComments: NAACL 2024 final version. arXiv admin note: text overlap with arXiv:2310.09917Subjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI)
Abstract: Zero-shot cross-lingual knowledge transfer enables a multilingual pretrained language model, finetuned on a task in one language, make predictions for this task in other languages. While being broadly studied for natural language understanding tasks, the described setting is understudied for generation. Previous works notice a frequent problem of generation in a wrong language and propose approaches to address it, usually using mT5 as a backbone model. In this work we compare various approaches proposed from the literature in unified settings, also including alternative backbone models, namely mBART and NLLB-200. We first underline the importance of tuning learning rate used for finetuning, which helps to substantially alleviate the problem of generation in the wrong language. Then, we show that with careful learning rate tuning, the simple full finetuning of the model acts as a very strong baseline and alternative approaches bring only marginal improvements. Finally, we find that mBART performs similarly to mT5 of the same size, and NLLB-200 can be competitive in some cases. Our final zero-shot models reach the performance of the approach based on data translation which is usually considered as an upper baseline for zero-shot cross-lingual transfer in generation.
- [1874] arXiv:2402.12280 (cross-list from cs.CL) [ pdf , ps , other ]
-
Title: Adaptive Skeleton Graph DecodingShuowei Jin , Yongji Wu , Haizhong Zheng , Qingzhao Zhang , Matthew Lentz , Z. Morley Mao , Atul Prakash , Feng Qian , Danyang ZhuoSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI)
Abstract: Large language models (LLMs) have seen significant adoption for natural language tasks, owing their success to massive numbers of model parameters (e.g., 70B+); however, LLM inference incurs significant computation and memory costs. Recent approaches propose parallel decoding strategies, such as Skeleton-of-Thought (SoT), to improve performance by breaking prompts down into sub-problems that can be decoded in parallel; however, they often suffer from reduced response quality. Our key insight is that we can request additional information, specifically dependencies and difficulty, when generating the sub-problems to improve both response quality and performance. In this paper, we propose Skeleton Graph Decoding (SGD), which uses dependencies exposed between sub-problems to support information forwarding between dependent sub-problems for improved quality while exposing parallelization opportunities for decoding independent sub-problems. Additionally, we leverage difficulty estimates for each sub-problem to select an appropriately-sized model, improving performance without significantly reducing quality. Compared to standard autoregressive generation and SoT, SGD achieves a 1.69x speedup while improving quality by up to 51%.
- [1875] arXiv:2402.12284 (cross-list from cs.LG) [ pdf , ps , other ]
-
Title: Refining Minimax Regret for Unsupervised Environment DesignMichael Beukman , Samuel Coward , Michael Matthews , Mattie Fellows , Minqi Jiang , Michael Dennis , Jakob FoersterComments: The first two authors contributed equallySubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Abstract: In unsupervised environment design, reinforcement learning agents are trained on environment configurations (levels) generated by an adversary that maximises some objective. Regret is a commonly used objective that theoretically results in a minimax regret (MMR) policy with desirable robustness guarantees; in particular, the agent's maximum regret is bounded. However, once the agent reaches this regret bound on all levels, the adversary will only sample levels where regret cannot be further reduced. Although there are possible performance improvements to be made outside of these regret-maximising levels, learning stagnates. In this work, we introduce Bayesian level-perfect MMR (BLP), a refinement of the minimax regret objective that overcomes this limitation. We formally show that solving for this objective results in a subset of MMR policies, and that BLP policies act consistently with a Perfect Bayesian policy over all levels. We further introduce an algorithm, ReMiDi, that results in a BLP policy at convergence. We empirically demonstrate that training on levels from a minimax regret adversary causes learning to prematurely stagnate, but that ReMiDi continues learning.
- [1876] arXiv:2402.12298 (cross-list from cs.CL) [ pdf , ps , other ]
-
Title: Is Open-Source There Yet? A Comparative Study on Commercial and Open-Source LLMs in Their Ability to Label Chest X-Ray ReportsFelix J. Dorfner , Liv Jürgensen , Leonhard Donle , Fares Al Mohamad , Tobias R. Bodenmann , Mason C. Cleveland , Felix Busch , Lisa C. Adams , James Sato , Thomas Schultz , Albert E. Kim , Jameson Merkow , Keno K. Bressem , Christopher P. BridgeSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI)
Abstract: Introduction: With the rapid advances in large language models (LLMs), there have been numerous new open source as well as commercial models. While recent publications have explored GPT-4 in its application to extracting information of interest from radiology reports, there has not been a real-world comparison of GPT-4 to different leading open-source models.
Materials and Methods: Two different and independent datasets were used. The first dataset consists of 540 chest x-ray reports that were created at the Massachusetts General Hospital between July 2019 and July 2021. The second dataset consists of 500 chest x-ray reports from the ImaGenome dataset. We then compared the commercial models GPT-3.5 Turbo and GPT-4 from OpenAI to the open-source models Mistral-7B, Mixtral-8x7B, Llama2-13B, Llama2-70B, QWEN1.5-72B and CheXbert and CheXpert-labeler in their ability to accurately label the presence of multiple findings in x-ray text reports using different prompting techniques.
Results: On the ImaGenome dataset, the best performing open-source model was Llama2-70B with micro F1-scores of 0.972 and 0.970 for zero- and few-shot prompts, respectively. GPT-4 achieved micro F1-scores of 0.975 and 0.984, respectively. On the institutional dataset, the best performing open-source model was QWEN1.5-72B with micro F1-scores of 0.952 and 0.965 for zero- and few-shot prompting, respectively. GPT-4 achieved micro F1-scores of 0.975 and 0.973, respectively.
Conclusion: In this paper, we show that while GPT-4 is superior to open-source models in zero-shot report labeling, the implementation of few-shot prompting can bring open-source models on par with GPT-4. This shows that open-source models could be a performant and privacy preserving alternative to GPT-4 for the task of radiology report classification. - [1877] arXiv:2402.12307 (cross-list from cs.LG) [ pdf , ps , other ]
-
Title: Multi-View Conformal Learning for Heterogeneous Sensor FusionSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Abstract: Being able to assess the confidence of individual predictions in machine learning models is crucial for decision making scenarios. Specially, in critical applications such as medical diagnosis, security, and unmanned vehicles, to name a few. In the last years, complex predictive models have had great success in solving hard tasks and new methods are being proposed every day. While the majority of new developments in machine learning models focus on improving the overall performance, less effort is put on assessing the trustworthiness of individual predictions, and even to a lesser extent, in the context of sensor fusion. To this end, we build and test multi-view and single-view conformal models for heterogeneous sensor fusion. Our models provide theoretical marginal confidence guarantees since they are based on the conformal prediction framework. We also propose a multi-view semi-conformal model based on sets intersection. Through comprehensive experimentation, we show that multi-view models perform better than single-view models not only in terms of accuracy-based performance metrics (as it has already been shown in several previous works) but also in conformal measures that provide uncertainty estimation. Our results also showed that multi-view models generate prediction sets with less uncertainty compared to single-view models.
- [1878] arXiv:2402.12317 (cross-list from cs.CL) [ pdf , ps , other ]
-
Title: ARKS: Active Retrieval in Knowledge Soup for Code GenerationComments: Retrieval-augmented code generationSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI)
Abstract: Recently the retrieval-augmented generation (RAG) paradigm has raised much attention for its potential in incorporating external knowledge into large language models (LLMs) without further training. While widely explored in natural language applications, its utilization in code generation remains under-explored. In this paper, we introduce Active Retrieval in Knowledge Soup (ARKS), an advanced strategy for generalizing large language models for code. In contrast to relying on a single source, we construct a knowledge soup integrating web search, documentation, execution feedback, and evolved code snippets. We employ an active retrieval strategy that iteratively refines the query and updates the knowledge soup. To assess the performance of ARKS, we compile a new benchmark comprising realistic coding problems associated with frequently updated libraries and long-tail programming languages. Experimental results on ChatGPT and CodeLlama demonstrate a substantial improvement in the average execution accuracy of ARKS on LLMs. The analysis confirms the effectiveness of our proposed knowledge soup and active retrieval strategies, offering rich insights into the construction of effective retrieval-augmented code generation (RACG) pipelines. Our model, code, and data are available at this https URL .
- [1879] arXiv:2402.12319 (cross-list from cs.LG) [ pdf , ps , html , other ]
-
Title: Dynamic Environment Responsive Online Meta-Learning with Fairness AwarenessComments: Accepted by TKDD, extended from KDD 2022. arXiv admin note: substantial text overlap with arXiv:2205.11264Subjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Computers and Society (cs.CY)
Abstract: The fairness-aware online learning framework has emerged as a potent tool within the context of continuous lifelong learning. In this scenario, the learner's objective is to progressively acquire new tasks as they arrive over time, while also guaranteeing statistical parity among various protected sub-populations, such as race and gender, when it comes to the newly introduced tasks. A significant limitation of current approaches lies in their heavy reliance on the i.i.d (independent and identically distributed) assumption concerning data, leading to a static regret analysis of the framework. Nevertheless, it's crucial to note that achieving low static regret does not necessarily translate to strong performance in dynamic environments characterized by tasks sampled from diverse distributions. In this paper, to tackle the fairness-aware online learning challenge in evolving settings, we introduce a unique regret measure, FairSAR, by incorporating long-term fairness constraints into a strongly adapted loss regret framework. Moreover, to determine an optimal model parameter at each time step, we introduce an innovative adaptive fairness-aware online meta-learning algorithm, referred to as FairSAOML. This algorithm possesses the ability to adjust to dynamic environments by effectively managing bias control and model accuracy. The problem is framed as a bi-level convex-concave optimization, considering both the model's primal and dual parameters, which pertain to its accuracy and fairness attributes, respectively. Theoretical analysis yields sub-linear upper bounds for both loss regret and the cumulative violation of fairness constraints. Our experimental evaluation on various real-world datasets in dynamic environments demonstrates that our proposed FairSAOML algorithm consistently outperforms alternative approaches rooted in the most advanced prior online learning methods.
- [1880] arXiv:2402.12329 (cross-list from cs.CL) [ pdf , ps , other ]
-
Title: Query-Based Adversarial Prompt GenerationSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI); Cryptography and Security (cs.CR); Machine Learning (cs.LG)
Abstract: Recent work has shown it is possible to construct adversarial examples that cause an aligned language model to emit harmful strings or perform harmful behavior. Existing attacks work either in the white-box setting (with full access to the model weights), or through transferability: the phenomenon that adversarial examples crafted on one model often remain effective on other models. We improve on prior work with a query-based attack that leverages API access to a remote language model to construct adversarial examples that cause the model to emit harmful strings with (much) higher probability than with transfer-only attacks. We validate our attack on GPT-3.5 and OpenAI's safety classifier; we can cause GPT-3.5 to emit harmful strings that current transfer attacks fail at, and we can evade the safety classifier with nearly 100% probability.
- [1881] arXiv:2402.12331 (cross-list from cs.LG) [ pdf , ps , html , other ]
-
Title: Generating Survival Interpretable Trajectories and DataSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
Abstract: A new model for generating survival trajectories and data based on applying an autoencoder of a specific structure is proposed. It solves three tasks. First, it provides predictions in the form of the expected event time and the survival function for a new generated feature vector on the basis of the Beran estimator. Second, the model generates additional data based on a given training set that would supplement the original dataset. Third, the most important, it generates a prototype time-dependent trajectory for an object, which characterizes how features of the object could be changed to achieve a different time to an event. The trajectory can be viewed as a type of the counterfactual explanation. The proposed model is robust during training and inference due to a specific weighting scheme incorporating into the variational autoencoder. The model also determines the censored indicators of new generated data by solving a classification task. The paper demonstrates the efficiency and properties of the proposed model using numerical experiments on synthetic and real datasets. The code of the algorithm implementing the proposed model is publicly available.
- [1882] arXiv:2402.12336 (cross-list from cs.LG) [ pdf , ps , html , other ]
-
Title: Robust CLIP: Unsupervised Adversarial Fine-Tuning of Vision Embeddings for Robust Large Vision-Language ModelsSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)
Abstract: Multi-modal foundation models like OpenFlamingo, LLaVA, and GPT-4 are increasingly used for various real-world tasks. Prior work has shown that these models are highly vulnerable to adversarial attacks on the vision modality. These attacks can be leveraged to spread fake information or defraud users, and thus pose a significant risk, which makes the robustness of large multi-modal foundation models a pressing problem. The CLIP model, or one of its variants, is used as a frozen vision encoder in many vision-language models (VLMs), e.g. LLaVA and OpenFlamingo. We propose an unsupervised adversarial fine-tuning scheme to obtain a robust CLIP vision encoder, which yields robustness on all vision down-stream tasks (VLMs, zero-shot classification) that rely on CLIP. In particular, we show that stealth-attacks on users of VLMs by a malicious third party providing manipulated images are no longer possible once one replaces the original CLIP model with our robust one. No retraining or fine-tuning of the VLM is required. The code and robust models are available at this https URL
- [1883] arXiv:2402.12343 (cross-list from cs.CL) [ pdf , ps , html , other ]
-
Title: Emulated Disalignment: Safety Alignment for Large Language Models May Backfire!Comments: Code is available at this https URLSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: Large language models (LLMs) need to undergo safety alignment to ensure safe conversations with humans. However, this paper introduces an inference-time attack method, demonstrating that safety alignment can be easily reversed to produce harmful language models without additional training. Specifically, this reversal is achieved by contrasting the output token distribution of a safety-aligned language model (e.g., Llama-2-chat) against its pre-trained version (e.g., Llama-2) so that the token predictions are shifted towards the opposite direction of alignment. We name this method emulated disalignment (ED) because it uses pure sampling to provably emulate (or "approximate") the result of fine-tuning the pre-trained model to minimize a safety reward. Our experiments with ED across three evaluation datasets and four model families (Llama-1, Llama-2, Mistral, and Alpaca) show that ED doubles the harmfulness of pre-trained models and outperforms strong baselines, achieving the highest harmful rate in 43 out of 48 evaluation subsets by a large margin. Eventually, given ED's need for language model output token distributions, which particularly compromises open-source models, our findings highlight the importance of reevaluating the practice of open-sourcing language models even after safety alignment.
- [1884] arXiv:2402.12348 (cross-list from cs.CL) [ pdf , ps , html , other ]
-
Title: GTBench: Uncovering the Strategic Reasoning Limitations of LLMs via Game-Theoretic EvaluationsJinhao Duan , Renming Zhang , James Diffenderfer , Bhavya Kailkhura , Lichao Sun , Elias Stengel-Eskin , Mohit Bansal , Tianlong Chen , Kaidi XuComments: 26 pages; the first two authors contributed equally; GTBench HF Leaderboard: this https URLSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: As Large Language Models (LLMs) are integrated into critical real-world applications, their strategic and logical reasoning abilities are increasingly crucial. This paper evaluates LLMs' reasoning abilities in competitive environments through game-theoretic tasks, e.g., board and card games that require pure logic and strategic reasoning to compete with opponents. We first propose GTBench, a language-driven environment composing 10 widely-recognized tasks, across a comprehensive game taxonomy: complete versus incomplete information, dynamic versus static, and probabilistic versus deterministic scenarios. Then, we investigate two key problems: (1) Characterizing game-theoretic reasoning of LLMs; (2) LLM-vs-LLM competitions as reasoning evaluation. We observe that (1) LLMs have distinct behaviors regarding various gaming scenarios; for example, LLMs fail in complete and deterministic games yet they are competitive in probabilistic gaming scenarios; (2) Open-source LLMs, e.g., CodeLlama-34b-Instruct, are less competitive than commercial LLMs, e.g., GPT-4, in complex games. In addition, code-pretraining greatly benefits strategic reasoning, while advanced reasoning methods such as Chain-of-Thought (CoT) and Tree-of-Thought (ToT) do not always help. Detailed error profiles are also provided for a better understanding of LLMs' behavior.
- [1885] arXiv:2402.12354 (cross-list from cs.LG) [ pdf , ps , other ]
-
Title: LoRA+: Efficient Low Rank Adaptation of Large ModelsComments: 27 pagesSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (stat.ML)
Abstract: In this paper, we show that Low Rank Adaptation (LoRA) as originally introduced in Hu et al. (2021) leads to suboptimal finetuning of models with large width (embedding dimension). This is due to the fact that adapter matrices A and B in LoRA are updated with the same learning rate. Using scaling arguments for large width networks, we demonstrate that using the same learning rate for A and B does not allow efficient feature learning. We then show that this suboptimality of LoRA can be corrected simply by setting different learning rates for the LoRA adapter matrices A and B with a well-chosen ratio. We call this proposed algorithm LoRA$+$. In our extensive experiments, LoRA$+$ improves performance (1-2 $\%$ improvements) and finetuning speed (up to $\sim$ 2X SpeedUp), at the same computational cost as LoRA.
- [1886] arXiv:2402.12360 (cross-list from math.NA) [ pdf , ps , other ]
-
Title: Nonlinear Discrete-Time Observers with Physics-Informed Neural NetworksHector Vargas Alvarez , Gianluca Fabiani , Ioannis G. Kevrekidis , Nikolaos Kazantzis , Constantinos SiettosSubjects: Numerical Analysis (math.NA) ; Artificial Intelligence (cs.AI); Dynamical Systems (math.DS)
Abstract: We use Physics-Informed Neural Networks (PINNs) to solve the discrete-time nonlinear observer state estimation problem. Integrated within a single-step exact observer linearization framework, the proposed PINN approach aims at learning a nonlinear state transformation map by solving a system of inhomogeneous functional equations. The performance of the proposed PINN approach is assessed via two illustrative case studies for which the observer linearizing transformation map can be derived analytically. We also perform an uncertainty quantification analysis for the proposed PINN scheme and we compare it with conventional power-series numerical implementations, which rely on the computation of a power series solution.
- [1887] arXiv:2402.12365 (cross-list from cs.LG) [ pdf , ps , html , other ]
-
Title: Universal Physics Transformers: A Framework For Efficiently Scaling Neural OperatorsBenedikt Alkin , Andreas Fürst , Simon Schmid , Lukas Gruber , Markus Holzleitner , Johannes BrandstetterSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Fluid Dynamics (physics.flu-dyn)
Abstract: Neural operators, serving as physics surrogate models, have recently gained increased interest. With ever increasing problem complexity, the natural question arises: what is an efficient way to scale neural operators to larger and more complex simulations - most importantly by taking into account different types of simulation datasets. This is of special interest since, akin to their numerical counterparts, different techniques are used across applications, even if the underlying dynamics of the systems are similar. Whereas the flexibility of transformers has enabled unified architectures across domains, neural operators mostly follow a problem specific design, where GNNs are commonly used for Lagrangian simulations and grid-based models predominate Eulerian simulations. We introduce Universal Physics Transformers (UPTs), an efficient and unified learning paradigm for a wide range of spatio-temporal problems. UPTs operate without grid- or particle-based latent structures, enabling flexibility and scalability across meshes and particles. UPTs efficiently propagate dynamics in the latent space, emphasized by inverse encoding and decoding techniques. Finally, UPTs allow for queries of the latent space representation at any point in space-time. We demonstrate diverse applicability and efficacy of UPTs in mesh-based fluid simulations, and steady-state Reynolds averaged Navier-Stokes simulations, and Lagrangian-based dynamics.
- [1888] arXiv:2402.12366 (cross-list from cs.LG) [ pdf , ps , other ]
-
Title: A Critical Evaluation of AI Feedback for Aligning Large Language ModelsSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
Abstract: Reinforcement learning with AI feedback (RLAIF) is a popular paradigm for improving the instruction-following abilities of powerful pre-trained language models. RLAIF first performs supervised fine-tuning (SFT) using demonstrations from a teacher model and then further fine-tunes the model with reinforcement learning (RL), using feedback from a critic model. While recent popular open-source models have demonstrated substantial improvements in performance from the RL step, in this paper we question whether the complexity of this RL step is truly warranted for AI feedback. We show that the improvements of the RL step are virtually entirely due to the widespread practice of using a weaker teacher model (e.g. GPT-3.5) for SFT data collection than the critic (e.g., GPT-4) used for AI feedback generation. Specifically, we show that simple supervised fine-tuning with GPT-4 as the teacher outperforms existing RLAIF pipelines. More generally, we find that the gains from RLAIF vary substantially across base model families, test-time evaluation protocols, and critic models. Finally, we provide a mechanistic explanation for when SFT may outperform the full two-step RLAIF pipeline as well as suggestions for making RLAIF maximally useful in practice.
- [1889] arXiv:2402.12370 (cross-list from cs.CL) [ pdf , ps , other ]
-
Title: AnaloBench: Benchmarking the Identification of Abstract and Long-context AnalogiesXiao Ye , Andrew Wang , Jacob Choi , Yining Lu , Shreya Sharma , Lingfeng Shen , Vijay Tiyyala , Nicholas Andrews , Daniel KhashabiSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI)
Abstract: Humans regularly engage in analogical thinking, relating personal experiences to current situations ($X$ is analogous to $Y$ because of $Z$). Analogical thinking allows humans to solve problems in creative ways, grasp difficult concepts, and articulate ideas more effectively. Can language models (LMs) do the same? To answer this question, we propose ANALOBENCH, a benchmark to determine analogical reasoning ability in LMs. Our benchmarking approach focuses on aspects of this ability that are common among humans: (i) recalling related experiences from a large amount of information, and (ii) applying analogical reasoning to complex and lengthy scenarios. We test a broad collection of proprietary models (e.g., GPT family, Claude V2) and open source models such as LLaMA2. As in prior results, scaling up LMs results in some performance boosts. Surprisingly, scale offers minimal gains when, (i) analogies involve lengthy scenarios, or (ii) recalling relevant scenarios from a large pool of information, a process analogous to finding a needle in a haystack. We hope these observations encourage further research in this field.
- [1890] arXiv:2402.12373 (cross-list from cs.PL) [ pdf , ps , html , other ]
-
Title: LTL learning on GPUsComments: 27 pagesSubjects: Programming Languages (cs.PL) ; Artificial Intelligence (cs.AI)
Abstract: Linear temporal logic (LTL) is widely used in industrial verification. LTL formulae can be learned from traces. Scaling LTL formula learning is an open problem. We implement the first GPU-based LTL learner using a novel form of enumerative program synthesis. The learner is sound and complete. Our benchmarks indicate that it handles traces at least 2048 times more numerous, and on average at least 46 times faster than existing state-of-the-art learners. This is achieved with, among others, novel branch-free LTL semantics that has $O(\log n)$ time complexity, where $n$ is trace length, while previous implementations are $O(n^2)$ or worse (assuming bitwise boolean operations and shifts by powers of 2 have unit costs -- a realistic assumption on modern processors).
- [1891] arXiv:2402.12390 (cross-list from cs.SI) [ pdf , ps , other ]
-
Title: A Semantic Social Network Analysis Tool for Sensitivity Analysis and What-If Scenario Testing in Alcohol Consumption StudiesJosé Alberto Benítez-Andrades , Alejandro Rodríguez-González , Carmen Benavides , Leticia Sánchez-Valdeón , Isaías GarcíaJournal-ref: Int. J. Environ. Res. Public Health 2018, 15(11), 2420;Subjects: Social and Information Networks (cs.SI) ; Artificial Intelligence (cs.AI)
Abstract: Social Network Analysis (SNA) is a set of techniques developed in the field of social and behavioral sciences research, in order to characterize and study the social relationships that are established among a set of individuals. When building a social network for performing an SNA analysis, an initial process of data gathering is achieved in order to extract the characteristics of the individuals and their relationships. This is usually done by completing a questionnaire containing different types of questions that will be later used to obtain the SNA measures needed to perform the study. There are, then, a great number of different possible network generating questions and also many possibilities for mapping the responses to the corresponding characteristics and relationships. Many variations may be introduced into these questions (the way they are posed, the weights given to each of the responses, etc.) that may have an effect on the resulting networks. All these different variations are difficult to achieve manually, because the process is time-consuming and error prone. The tool described in this paper uses semantic knowledge representation techniques in order to facilitate this kind of sensitivity studies. The base of the tool is a conceptual structure, called "ontology" that is able to represent the different concepts and their definitions. The tool is compared to other similar ones, and the advantages of the approach are highlighted, giving some particular examples from an ongoing SNA study about alcohol consumption habits in adolescents.
- [1892] arXiv:2402.12391 (cross-list from q-bio.GN) [ pdf , ps , html , other ]
-
Title: Toward a Team of AI-made Scientists for Scientific Discovery from Gene Expression DataHaoyang Liu , Yijiang Li , Jinglin Jian , Yuxuan Cheng , Jianrong Lu , Shuyi Guo , Jinglei Zhu , Mianchen Zhang , Miantong Zhang , Haohan WangComments: 18 pages, 2 figures; added contactSubjects: Genomics (q-bio.GN) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: Machine learning has emerged as a powerful tool for scientific discovery, enabling researchers to extract meaningful insights from complex datasets. For instance, it has facilitated the identification of disease-predictive genes from gene expression data, significantly advancing healthcare. However, the traditional process for analyzing such datasets demands substantial human effort and expertise for the data selection, processing, and analysis. To address this challenge, we introduce a novel framework, a Team of AI-made Scientists (TAIS), designed to streamline the scientific discovery pipeline. TAIS comprises simulated roles, including a project manager, data engineer, and domain expert, each represented by a Large Language Model (LLM). These roles collaborate to replicate the tasks typically performed by data scientists, with a specific focus on identifying disease-predictive genes. Furthermore, we have curated a benchmark dataset to assess TAIS's effectiveness in gene identification, demonstrating our system's potential to significantly enhance the efficiency and scope of scientific exploration. Our findings represent a solid step towards automating scientific discovery through large language models.
- [1893] arXiv:2402.12392 (cross-list from stat.AP) [ pdf , ps , html , other ]
-
Title: A Regression Mixture Model to understand the effect of the Covid-19 pandemic on Public Transport RidershipSubjects: Applications (stat.AP) ; Artificial Intelligence (cs.AI)
Abstract: The Covid-19 pandemic drastically changed urban mobility, both during the height of the pandemic with government lockdowns, but also in the longer term with the adoption of working-from-home policies. To understand its effects on rail public transport ridership, we propose a dedicated Regression Mixture Model able to perform both the clustering of public transport stations and the segmentation of time periods, while ignoring variations due to additional variables such as the official lockdowns or non-working days. Each cluster is thus defined by a series of segments in which the effect of the exogenous variables is constant. As each segment within a cluster has its own regression coefficients to model the impact of the covariates, we analyze how these coefficients evolve to understand the changes in the cluster. We present the regression mixture model and the parameter estimation using the EM algorithm, before demonstrating the benefits of the model on both simulated and real data. Thanks to a five-year dataset of the ridership in the Paris public transport system, we analyze the impact of the pandemic, not only in terms of the number of travelers but also on the weekly commute. We further analyze the specific changes that the pandemic caused inside each cluster.
- [1894] arXiv:2402.12393 (cross-list from cs.HC) [ pdf , ps , html , other ]
-
Title: On Automating Video Game Regression Testing by Planning and LearningSubjects: Human-Computer Interaction (cs.HC) ; Artificial Intelligence (cs.AI)
Abstract: In this paper, we propose a method and workflow for automating regression testing of certain video game aspects using automated planning and incremental action model learning techniques. The basic idea is to use detailed game logs and incremental action model learning techniques to maintain a formal model in the planning domain description language (PDDL) of the gameplay mechanics. The workflow enables efficient cooperation of game developers without any experience with PDDL or other formal systems and a person experienced with PDDL modeling but no game development skills. We describe the method and workflow in general and then demonstrate it on a concrete proof-of-concept example -- a simple role-playing game provided as one of the tutorial projects in the popular game development engine Unity. This paper presents the first step towards minimizing or even eliminating the need for a modeling expert in the workflow, thus making automated planning accessible to a broader audience.
- [1895] arXiv:2402.12394 (cross-list from cs.HC) [ pdf , ps , html , other ]
-
Title: Improving Model's Interpretability and Reliability using BiomarkersGautam Rajendrakumar Gare , Tom Fox , Beam Chansangavej , Amita Krishnan , Ricardo Luis Rodriguez , Bennett P deBoisblanc , Deva Kannan Ramanan , John Michael GaleottiComments: Accepted at BIAS 2023 ConferenceSubjects: Human-Computer Interaction (cs.HC) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Image and Video Processing (eess.IV)
Abstract: Accurate and interpretable diagnostic models are crucial in the safety-critical field of medicine. We investigate the interpretability of our proposed biomarker-based lung ultrasound diagnostic pipeline to enhance clinicians' diagnostic capabilities. The objective of this study is to assess whether explanations from a decision tree classifier, utilizing biomarkers, can improve users' ability to identify inaccurate model predictions compared to conventional saliency maps. Our findings demonstrate that decision tree explanations, based on clinically established biomarkers, can assist clinicians in detecting false positives, thus improving the reliability of diagnostic models in medicine.
- [1896] arXiv:2402.12396 (cross-list from astro-ph.HE) [ pdf , ps , html , other ]
-
Title: Toward using GANs in astrophysical Monte-Carlo simulationsComments: Proceedings of ADASS XXXIII (2023)Subjects: High Energy Astrophysical Phenomena (astro-ph.HE) ; Artificial Intelligence (cs.AI)
Abstract: Accurate modelling of spectra produced by X-ray sources requires the use of Monte-Carlo simulations. These simulations need to evaluate physical processes, such as those occurring in accretion processes around compact objects by sampling a number of different probability distributions. This is computationally time-consuming and could be sped up if replaced by neural networks. We demonstrate, on an example of the Maxwell-Jüttner distribution that describes the speed of relativistic electrons, that the generative adversarial network (GAN) is capable of statistically replicating the distribution. The average value of the Kolmogorov-Smirnov test is 0.5 for samples generated by the neural network, showing that the generated distribution cannot be distinguished from the true distribution.
- [1897] arXiv:2402.12399 (cross-list from cs.LG) [ pdf , ps , html , other ]
-
Title: Turn Waste into Worth: Rectifying Top-$k$ Router of MoEZhiyuan Zeng , Qipeng Guo , Zhaoye Fei , Zhangyue Yin , Yunhua Zhou , Linyang Li , Tianxiang Sun , Hang Yan , Dahua Lin , Xipeng QiuSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
Abstract: Sparse Mixture of Experts (MoE) models are popular for training large language models due to their computational efficiency. However, the commonly used top-$k$ routing mechanism suffers from redundancy computation and memory costs due to the unbalanced routing. Some experts are overflow, where the exceeding tokens are dropped. While some experts are vacant, which are padded with zeros, negatively impacting model performance. To address the dropped tokens and padding, we propose the Rectify-Router, comprising the Intra-GPU Rectification and the Fill-in Rectification. The Intra-GPU Rectification handles dropped tokens, efficiently routing them to experts within the GPU where they are located to avoid inter-GPU communication. The Fill-in Rectification addresses padding by replacing padding tokens with the tokens that have high routing scores. Our experimental results demonstrate that the Intra-GPU Rectification and the Fill-in Rectification effectively handle dropped tokens and padding, respectively. Furthermore, the combination of them achieves superior performance, surpassing the accuracy of the vanilla top-1 router by 4.7%.
- [1898] arXiv:2402.12405 (cross-list from q-bio.GN) [ pdf , ps , html , other ]
-
Title: scInterpreter: Training Large Language Models to Interpret scRNA-seq Data for Cell Type AnnotationComments: 4 pages, submitted to FCSSubjects: Genomics (q-bio.GN) ; Artificial Intelligence (cs.AI)
Abstract: Despite the inherent limitations of existing Large Language Models in directly reading and interpreting single-cell omics data, they demonstrate significant potential and flexibility as the Foundation Model. This research focuses on how to train and adapt the Large Language Model with the capability to interpret and distinguish cell types in single-cell RNA sequencing data. Our preliminary research results indicate that these foundational models excel in accurately categorizing known cell types, demonstrating the potential of the Large Language Models as effective tools for uncovering new biological insights.
- [1899] arXiv:2402.12406 (cross-list from cs.LG) [ pdf , ps , html , other ]
-
Title: Teacher as a Lenient Expert: Teacher-Agnostic Data-Free Knowledge DistillationComments: Accepted in AAAI-2024Subjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
Abstract: Data-free knowledge distillation (DFKD) aims to distill pretrained knowledge to a student model with the help of a generator without using original data. In such data-free scenarios, achieving stable performance of DFKD is essential due to the unavailability of validation data. Unfortunately, this paper has discovered that existing DFKD methods are quite sensitive to different teacher models, occasionally showing catastrophic failures of distillation, even when using well-trained teacher models. Our observation is that the generator in DFKD is not always guaranteed to produce precise yet diverse samples using the existing representative strategy of minimizing both class-prior and adversarial losses. Through our empirical study, we focus on the fact that class-prior not only decreases the diversity of generated samples, but also cannot completely address the problem of generating unexpectedly low-quality samples depending on teacher models. In this paper, we propose the teacher-agnostic data-free knowledge distillation (TA-DFKD) method, with the goal of more robust and stable performance regardless of teacher models. Our basic idea is to assign the teacher model a lenient expert role for evaluating samples, rather than a strict supervisor that enforces its class-prior on the generator. Specifically, we design a sample selection approach that takes only clean samples verified by the teacher model without imposing restrictions on the power of generating diverse samples. Through extensive experiments, we show that our method successfully achieves both robustness and training stability across various teacher models, while outperforming the existing DFKD methods.
- [1900] arXiv:2402.12408 (cross-list from cs.LG) [ pdf , ps , html , other ]
-
Title: ModelGPT: Unleashing LLM's Capabilities for Tailored Model GenerationSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
Abstract: The rapid advancement of Large Language Models (LLMs) has revolutionized various sectors by automating routine tasks, marking a step toward the realization of Artificial General Intelligence (AGI). However, they still struggle to accommodate the diverse and specific needs of users and simplify the utilization of AI models for the average user. In response, we propose ModelGPT, a novel framework designed to determine and generate AI models specifically tailored to the data or task descriptions provided by the user, leveraging the capabilities of LLMs. Given user requirements, ModelGPT is able to provide tailored models at most 270x faster than the previous paradigms (e.g. all-parameter or LoRA finetuning). Comprehensive experiments on NLP, CV, and Tabular datasets attest to the effectiveness of our framework in making AI models more accessible and user-friendly. Our code is available at this https URL .
- [1901] arXiv:2402.12411 (cross-list from cs.SI) [ pdf , ps , html , other ]
-
Title: Deep Structural Knowledge Exploitation and Synergy for Estimating Node Importance Value on Heterogeneous Information NetworksComments: Accepted by AAAI 2024Subjects: Social and Information Networks (cs.SI) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: Node importance estimation problem has been studied conventionally with homogeneous network topology analysis. To deal with network heterogeneity, a few recent methods employ graph neural models to automatically learn diverse sources of information. However, the major concern revolves around that their full adaptive learning process may lead to insufficient information exploration, thereby formulating the problem as the isolated node value prediction with underperformance and less interpretability. In this work, we propose a novel learning framework: SKES. Different from previous automatic learning designs, SKES exploits heterogeneous structural knowledge to enrich the informativeness of node representations. Based on a sufficiently uninformative reference, SKES estimates the importance value for any input node, by quantifying its disparity against the reference. This establishes an interpretable node importance computation paradigm. Furthermore, SKES dives deep into the understanding that "nodes with similar characteristics are prone to have similar importance values" whilst guaranteeing that such informativeness disparity between any different nodes is orderly reflected by the embedding distance of their associated latent features. Extensive experiments on three widely-evaluated benchmarks demonstrate the performance superiority of SKES over several recent competing methods.
- [1902] arXiv:2402.12412 (cross-list from cs.HC) [ pdf , ps , html , other ]
-
Title: Dynamic and Super-Personalized Media Ecosystem Driven by Generative AI: Unpredictable Plays Never Repeating The SameComments: 13 pages, 7 figuresSubjects: Human-Computer Interaction (cs.HC) ; Artificial Intelligence (cs.AI); Multimedia (cs.MM); Signal Processing (eess.SP)
Abstract: This paper introduces a media service model that exploits artificial intelligence (AI) video generators at the receive end. This proposal deviates from the traditional multimedia ecosystem, completely relying on in-house production, by shifting part of the content creation onto the receiver. We bring a semantic process into the framework, allowing the distribution network to provide service elements that prompt the content generator, rather than distributing encoded data of fully finished programs. The service elements include fine-tailored text descriptions, lightweight image data of some objects, or application programming interfaces, comprehensively referred to as semantic sources, and the user terminal translates the received semantic data into video frames. Empowered by the random nature of generative AI, the users could then experience super-personalized services accordingly. The proposed idea incorporates the situations in which the user receives different service providers' element packages; a sequence of packages over time, or multiple packages at the same time. Given promised in-context coherence and content integrity, the combinatory dynamics will amplify the service diversity, allowing the users to always chance upon new experiences. This work particularly aims at short-form videos and advertisements, which the users would easily feel fatigued by seeing the same frame sequence every time. In those use cases, the content provider's role will be recast as scripting semantic sources, transformed from a thorough producer. Overall, this work explores a new form of media ecosystem facilitated by receiver-embedded generative models, featuring both random content dynamics and enhanced delivery efficiency simultaneously.
- [1903] arXiv:2402.12416 (cross-list from cs.MA) [ pdf , ps , html , other ]
-
Title: Aligning Individual and Collective Objectives in Multi-Agent CooperationComments: 15 pagesSubjects: Multiagent Systems (cs.MA) ; Artificial Intelligence (cs.AI)
Abstract: In the field of multi-agent learning, the challenge of mixed-motive cooperation is pronounced, given the inherent contradictions between individual and collective goals. Current research in this domain primarily focuses on incorporating domain knowledge into rewards or introducing additional mechanisms to foster cooperation. However, many of these methods suffer from the drawbacks of manual design costs and the lack of a theoretical grounding convergence procedure to the solution. To address this gap, we approach the mixed-motive game by modeling it as a differentiable game to study learning dynamics. We introduce a novel optimization method named Altruistic Gradient Adjustment (AgA) that employs gradient adjustments to novelly align individual and collective objectives. Furthermore, we provide theoretical proof that the selection of an appropriate alignment weight in AgA can accelerate convergence towards the desired solutions while effectively avoiding the undesired ones. The visualization of learning dynamics effectively demonstrates that AgA successfully achieves alignment between individual and collective objectives. Additionally, through evaluations conducted on established mixed-motive benchmarks such as the public good game, Cleanup, Harvest, and our modified mixed-motive SMAC environment, we validate AgA's capability to facilitate altruistic and fair collaboration.
- [1904] arXiv:2402.12417 (cross-list from cs.LG) [ pdf , ps , other ]
-
Title: Predicting trucking accidents with truck drivers 'safety climate perception across companies: A transfer learning approachComments: submitted to journal: accident analysis and preventionSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Abstract: There is a rising interest in using artificial intelligence (AI)-powered safety analytics to predict accidents in the trucking industry. Companies may face the practical challenge, however, of not having enough data to develop good safety analytics models. Although pretrained models may offer a solution for such companies, existing safety research using transfer learning has mostly focused on computer vision and natural language processing, rather than accident analytics. To fill the above gap, we propose a pretrain-then-fine-tune transfer learning approach to help any company leverage other companies' data to develop AI models for a more accurate prediction of accident risk. We also develop SafeNet, a deep neural network algorithm for classification tasks suitable for accident prediction. Using the safety climate survey data from seven trucking companies with different data sizes, we show that our proposed approach results in better model performance compared to training the model from scratch using only the target company's data. We also show that for the transfer learning model to be effective, the pretrained model should be developed with larger datasets from diverse sources. The trucking industry may, thus, consider pooling safety analytics data from a wide range of companies to develop pretrained models and share them within the industry for better knowledge and resource transfer. The above contributions point to the promise of advanced safety analytics to make the industry safer and more sustainable.
- [1905] arXiv:2402.12418 (cross-list from cs.LG) [ pdf , ps , html , other ]
-
Title: Beyond Uniform Scaling: Exploring Depth Heterogeneity in Neural ArchitecturesComments: Accepted At ICLR 2024 (Tiny Paper Track)Subjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Neural and Evolutionary Computing (cs.NE)
Abstract: Conventional scaling of neural networks typically involves designing a base network and growing different dimensions like width, depth, etc. of the same by some predefined scaling factors. We introduce an automated scaling approach leveraging second-order loss landscape information. Our method is flexible towards skip connections a mainstay in modern vision transformers. Our training-aware method jointly scales and trains transformers without additional training iterations. Motivated by the hypothesis that not all neurons need uniform depth complexity, our approach embraces depth heterogeneity. Extensive evaluations on DeiT-S with ImageNet100 show a 2.5% accuracy gain and 10% parameter efficiency improvement over conventional scaling. Scaled networks demonstrate superior performance upon training small scale datasets from scratch. We introduce the first intact scaling mechanism for vision transformers, a step towards efficient model scaling.
- [1906] arXiv:2402.12419 (cross-list from cs.LG) [ pdf , ps , html , other ]
-
Title: EBFT: Effective and Block-Wise Fine-Tuning for Sparse LLMsSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
Abstract: Existing methods for fine-tuning sparse LLMs often suffer from resource-intensive requirements and high retraining costs. Additionally, many fine-tuning methods often rely on approximations or heuristic optimization strategies, which may lead to suboptimal solutions. To address these issues, we propose an efficient and fast framework for fine-tuning sparse LLMs based on minimizing reconstruction error. Our approach involves sampling a small dataset for calibration and utilizing backpropagation to iteratively optimize block-wise reconstruction error, on a block-by-block basis, aiming for optimal solutions. Extensive experiments on various benchmarks consistently demonstrate the superiority of our method over other baselines. For instance, on the Wikitext2 dataset with LlamaV1-7B at 70% sparsity, our proposed EBFT achieves a perplexity of 16.88, surpassing the state-of-the-art DSnoT with a perplexity of 75.14. Moreover, with a structured sparsity ratio of 26\%, EBFT achieves a perplexity of 16.27, outperforming LoRA (perplexity 16.44). Furthermore, the fine-tuning process of EBFT for LlamaV1-7B only takes approximately 30 minutes, and the entire framework can be executed on a single 16GB GPU. The source code is available at this https URL .
- [1907] arXiv:2402.12424 (cross-list from cs.LG) [ pdf , ps , html , other ]
-
Title: Tables as Images? Exploring the Strengths and Limitations of LLMs on Multimodal Representations of Tabular DataNaihao Deng , Zhenjie Sun , Ruiqi He , Aman Sikka , Yulong Chen , Lin Ma , Yue Zhang , Rada MihalceaSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV)
Abstract: In this paper, we investigate the effectiveness of various LLMs in interpreting tabular data through different prompting strategies and data formats. Our analysis extends across six benchmarks for table-related tasks such as question-answering and fact-checking. We introduce for the first time the assessment of LLMs' performance on image-based table representations. Specifically, we compare five text-based and three image-based table representations, demonstrating the influence of representation and prompting on LLM performance. Our study provides insights into the effective use of LLMs on table-related tasks.
- [1908] arXiv:2402.12451 (cross-list from cs.CV) [ pdf , ps , html , other ]
-
Title: The (R)Evolution of Multimodal Large Language Models: A SurveyDavide Caffagni , Federico Cocchi , Luca Barsellotti , Nicholas Moratelli , Sara Sarto , Lorenzo Baraldi , Lorenzo Baraldi , Marcella Cornia , Rita CucchiaraSubjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Multimedia (cs.MM)
Abstract: Connecting text and visual modalities plays an essential role in generative intelligence. For this reason, inspired by the success of large language models, significant research efforts are being devoted to the development of Multimodal Large Language Models (MLLMs). These models can seamlessly integrate visual and textual modalities, both as input and output, while providing a dialogue-based interface and instruction-following capabilities. In this paper, we provide a comprehensive review of recent visual-based MLLMs, analyzing their architectural choices, multimodal alignment strategies, and training techniques. We also conduct a detailed analysis of these models across a wide range of tasks, including visual grounding, image generation and editing, visual understanding, and domain-specific applications. Additionally, we compile and describe training datasets and evaluation benchmarks, conducting comparisons among existing models in terms of performance and computational requirements. Overall, this survey offers a comprehensive overview of the current state of the art, laying the groundwork for future MLLMs.
- [1909] arXiv:2402.12479 (cross-list from cs.LG) [ pdf , ps , html , other ]
-
Title: In deep reinforcement learning, a pruned network is a good networkSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Abstract: Recent work has shown that deep reinforcement learning agents have difficulty in effectively using their network parameters. We leverage prior insights into the advantages of sparse training techniques and demonstrate that gradual magnitude pruning enables agents to maximize parameter effectiveness. This results in networks that yield dramatic performance improvements over traditional networks and exhibit a type of "scaling law", using only a small fraction of the full network parameters.
- [1910] arXiv:2402.12490 (cross-list from cs.LG) [ pdf , ps , html , other ]
-
Title: Towards Cross-Domain Continual LearningComments: 12 pages, 2 Figures, 4 Tables. To be published at the IEEE International Conference on Data Engineering (ICDE) 2024Subjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
Abstract: Continual learning is a process that involves training learning agents to sequentially master a stream of tasks or classes without revisiting past data. The challenge lies in leveraging previously acquired knowledge to learn new tasks efficiently, while avoiding catastrophic forgetting. Existing methods primarily focus on single domains, restricting their applicability to specific problems.
In this work, we introduce a novel approach called Cross-Domain Continual Learning (CDCL) that addresses the limitations of being limited to single supervised domains. Our method combines inter- and intra-task cross-attention mechanisms within a compact convolutional network. This integration enables the model to maintain alignment with features from previous tasks, thereby delaying the data drift that may occur between tasks, while performing unsupervised cross-domain (UDA) between related domains. By leveraging an intra-task-specific pseudo-labeling method, we ensure accurate input pairs for both labeled and unlabeled samples, enhancing the learning process. To validate our approach, we conduct extensive experiments on public UDA datasets, showcasing its positive performance on cross-domain continual learning challenges. Additionally, our work introduces incremental ideas that contribute to the advancement of this field.
We make our code and models available to encourage further exploration and reproduction of our results: \url{ this https URL } - [1911] arXiv:2402.12499 (cross-list from cs.GT) [ pdf , ps , other ]
-
Title: Automated Security Response through Online Learning with Adaptive ConjecturesComments: This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessibleSubjects: Computer Science and Game Theory (cs.GT) ; Artificial Intelligence (cs.AI); Cryptography and Security (cs.CR); Machine Learning (cs.LG); Systems and Control (eess.SY)
Abstract: We study automated security response for an IT infrastructure and formulate the interaction between an attacker and a defender as a partially observed, non-stationary game. We relax the standard assumption that the game model is correctly specified and consider that each player has a probabilistic conjecture about the model, which may be misspecified in the sense that the true model has probability 0. This formulation allows us to capture uncertainty about the infrastructure and the intents of the players. To learn effective game strategies online, we design a novel method where a player iteratively adapts its conjecture using Bayesian learning and updates its strategy through rollout. We prove that the conjectures converge to best fits, and we provide a bound on the performance improvement that rollout enables with a conjectured model. To characterize the steady state of the game, we propose a variant of the Berk-Nash equilibrium. We present our method through an advanced persistent threat use case. Simulation studies based on testbed measurements show that our method produces effective security strategies that adapt to a changing environment. We also find that our method enables faster convergence than current reinforcement learning techniques.
- [1912] arXiv:2402.12518 (cross-list from cs.LG) [ pdf , ps , html , other ]
-
Title: Gaussian Process Neural Additive ModelsComments: Appears at AAAI 2024Subjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Abstract: Deep neural networks have revolutionized many fields, but their black-box nature also occasionally prevents their wider adoption in fields such as healthcare and finance, where interpretable and explainable models are required. The recent development of Neural Additive Models (NAMs) is a significant step in the direction of interpretable deep learning for tabular datasets. In this paper, we propose a new subclass of NAMs that use a single-layer neural network construction of the Gaussian process via random Fourier features, which we call Gaussian Process Neural Additive Models (GP-NAM). GP-NAMs have the advantage of a convex objective function and number of trainable parameters that grows linearly with feature dimensionality. It suffers no loss in performance compared to deeper NAM approaches because GPs are well-suited for learning complex non-parametric univariate functions. We demonstrate the performance of GP-NAM on several tabular datasets, showing that it achieves comparable or better performance in both classification and regression tasks with a large reduction in the number of parameters.
- [1913] arXiv:2402.12525 (cross-list from cs.CV) [ pdf , ps , html , other ]
-
Title: LangXAI: Integrating Large Vision Models for Generating Textual Explanations to Enhance Explainability in Visual Perception TasksTruong Thanh Hung Nguyen , Tobias Clement , Phuc Truong Loc Nguyen , Nils Kemmerzell , Van Binh Truong , Vo Thanh Khang Nguyen , Mohamed Abdelaal , Hung CaoSubjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI)
Abstract: LangXAI is a framework that integrates Explainable Artificial Intelligence (XAI) with advanced vision models to generate textual explanations for visual recognition tasks. Despite XAI advancements, an understanding gap persists for end-users with limited domain knowledge in artificial intelligence and computer vision. LangXAI addresses this by furnishing text-based explanations for classification, object detection, and semantic segmentation model outputs to end-users. Preliminary results demonstrate LangXAI's enhanced plausibility, with high BERTScore across tasks, fostering a more transparent and reliable AI framework on vision tasks for end-users.
- [1914] arXiv:2402.12527 (cross-list from cs.LG) [ pdf , ps , html , other ]
-
Title: The Edge-of-Reach Problem in Offline Model-Based Reinforcement LearningComments: Code open-sourced at: this https URLSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Abstract: Offline reinforcement learning aims to enable agents to be trained from pre-collected datasets, however, this comes with the added challenge of estimating the value of behavior not covered in the dataset. Model-based methods offer a solution by allowing agents to collect additional synthetic data via rollouts in a learned dynamics model. The prevailing theoretical understanding is that this can then be viewed as online reinforcement learning in an approximate dynamics model, and any remaining gap is therefore assumed to be due to the imperfect dynamics model. Surprisingly, however, we find that if the learned dynamics model is replaced by the true error-free dynamics, existing model-based methods completely fail. This reveals a major misconception. Our subsequent investigation finds that the general procedure used in model-based algorithms results in the existence of a set of edge-of-reach states which trigger pathological value overestimation and collapse in Bellman-based algorithms. We term this the edge-of-reach problem. Based on this, we fill some gaps in existing theory and also explain how prior model-based methods are inadvertently addressing the true underlying edge-of-reach problem. Finally, we propose Reach-Aware Value Learning (RAVL), a simple and robust method that directly addresses the edge-of-reach problem and achieves strong performance across both proprioceptive and pixel-based benchmarks. Code open-sourced at: this https URL .
- [1915] arXiv:2402.12530 (cross-list from cs.CL) [ pdf , ps , html , other ]
-
Title: Parallel Structures in Pre-training Data Yield In-Context LearningSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: Pre-trained language models (LMs) are capable of in-context learning (ICL): they can adapt to a task with only a few examples given in the prompt without any parameter update. However, it is unclear where this capability comes from as there is a stark distribution shift between pre-training text and ICL prompts. In this work, we study what patterns of the pre-training data contribute to ICL. We find that LMs' ICL ability depends on $\textit{parallel structures}$ in the pre-training data -- pairs of phrases following similar templates in the same context window. Specifically, we detect parallel structures by checking whether training on one phrase improves prediction of the other, and conduct ablation experiments to study their effect on ICL. We show that removing parallel structures in the pre-training data reduces LMs' ICL accuracy by 51% (vs 2% from random ablation). This drop persists even when excluding common patterns such as n-gram repetitions and long-range dependency, showing the diversity and generality of parallel structures. A closer look at the detected parallel structures indicates that they cover diverse linguistic tasks and span long distances in the data.
- [1916] arXiv:2402.12551 (cross-list from cs.CV) [ pdf , ps , html , other ]
-
Title: Landmark-based Localization using Stereo Vision and Deep Learning in GPS-Denied Battlefield EnvironmentComments: arXiv admin note: text overlap with arXiv:2402.12320Subjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI)
Abstract: Localization in a battlefield environment is increasingly challenging as GPS connectivity is often denied or unreliable, and physical deployment of anchor nodes across wireless networks for localization can be difficult in hostile battlefield terrain. Existing range-free localization methods rely on radio-based anchors and their average hop distance which suffers from accuracy and stability in dynamic and sparse wireless network topology. Vision-based methods like SLAM and Visual Odometry use expensive sensor fusion techniques for map generation and pose estimation. This paper proposes a novel framework for localization in non-GPS battlefield environments using only the passive camera sensors and considering naturally existing or artificial landmarks as anchors. The proposed method utilizes a customcalibrated stereo vision camera for distance estimation and the YOLOv8s model, which is trained and fine-tuned with our real-world dataset for landmark recognition. The depth images are generated using an efficient stereomatching algorithm, and distances to landmarks are determined by extracting the landmark depth feature utilizing a bounding box predicted by the landmark recognition model. The position of the unknown node is then obtained using the efficient least square algorithm and then optimized using the L-BFGS-B (limited-memory quasi-Newton code for bound-constrained optimization) method. Experimental results demonstrate that our proposed framework performs better than existing anchorbased DV-Hop algorithms and competes with the most efficient vision-based algorithms in terms of localization error (RMSE).
- [1917] arXiv:2402.12560 (cross-list from cs.CL) [ pdf , ps , html , other ]
-
Title: CausalGym: Benchmarking causal interpretability methods on linguistic tasksComments: 9 pages main text, 26 pages totalSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI)
Abstract: Language models (LMs) have proven to be powerful tools for psycholinguistic research, but most prior work has focused on purely behavioural measures (e.g., surprisal comparisons). At the same time, research in model interpretability has begun to illuminate the abstract causal mechanisms shaping LM behavior. To help bring these strands of research closer together, we introduce CausalGym. We adapt and expand the SyntaxGym suite of tasks to benchmark the ability of interpretability methods to causally affect model behaviour. To illustrate how CausalGym can be used, we study the pythia models (14M--6.9B) and assess the causal efficacy of a wide range of interpretability methods, including linear probing and distributed alignment search (DAS). We find that DAS outperforms the other methods, and so we use it to study the learning trajectory of two difficult linguistic phenomena in pythia-1b: negative polarity item licensing and filler--gap dependencies. Our analysis shows that the mechanism implementing both of these tasks is learned in discrete stages, not gradually.
- [1918] arXiv:2402.12563 (cross-list from cs.CL) [ pdf , ps , html , other ]
-
Title: Confidence Matters: Revisiting Intrinsic Self-Correction Capabilities of Large Language ModelsComments: 12 figures, 9 tablesSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI)
Abstract: The recent success of Large Language Models (LLMs) has catalyzed an increasing interest in their self-correction capabilities. This paper presents a comprehensive investigation into the intrinsic self-correction of LLMs, attempting to address the ongoing debate about its feasibility. Our research has identified an important latent factor - the "confidence" of LLMs - during the self-correction process. Overlooking this factor may cause the models to over-criticize themselves, resulting in unreliable conclusions regarding the efficacy of self-correction. We have experimentally observed that LLMs possess the capability to understand the "confidence" in their own responses. It motivates us to develop an "If-or-Else" (IoE) prompting framework, designed to guide LLMs in assessing their own "confidence", facilitating intrinsic self-corrections. We conduct extensive experiments and demonstrate that our IoE-based Prompt can achieve a consistent improvement regarding the accuracy of self-corrected responses over the initial answers. Our study not only sheds light on the underlying factors affecting self-correction in LLMs, but also introduces a practical framework that utilizes the IoE prompting principle to efficiently improve self-correction capabilities with "confidence". The code is available at this https URL .
- [1919] arXiv:2402.12570 (cross-list from cs.LG) [ pdf , ps , html , other ]
-
Title: Offline Multi-task Transfer RL with Representational PenalizationSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Abstract: We study the problem of representation transfer in offline Reinforcement Learning (RL), where a learner has access to episodic data from a number of source tasks collected a priori, and aims to learn a shared representation to be used in finding a good policy for a target task. Unlike in online RL where the agent interacts with the environment while learning a policy, in the offline setting there cannot be such interactions in either the source tasks or the target task; thus multi-task offline RL can suffer from incomplete coverage.
We propose an algorithm to compute pointwise uncertainty measures for the learnt representation, and establish a data-dependent upper bound for the suboptimality of the learnt policy for the target task. Our algorithm leverages the collective exploration done by source tasks to mitigate poor coverage at some points by a few tasks, thus overcoming the limitation of needing uniformly good coverage for a meaningful transfer by existing offline algorithms. We complement our theoretical results with empirical evaluation on a rich-observation MDP which requires many samples for complete coverage. Our findings illustrate the benefits of penalizing and quantifying the uncertainty in the learnt representation. - [1920] arXiv:2402.12572 (cross-list from cs.LG) [ pdf , ps , html , other ]
-
Title: FairProof : Confidential and Certifiable Fairness for Neural NetworksSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Cryptography and Security (cs.CR)
Abstract: Machine learning models are increasingly used in societal applications, yet legal and privacy concerns demand that they very often be kept confidential. Consequently, there is a growing distrust about the fairness properties of these models in the minds of consumers, who are often at the receiving end of model predictions. To this end, we propose FairProof - a system that uses Zero-Knowledge Proofs (a cryptographic primitive) to publicly verify the fairness of a model, while maintaining confidentiality. We also propose a fairness certification algorithm for fully-connected neural networks which is befitting to ZKPs and is used in this system. We implement FairProof in Gnark and demonstrate empirically that our system is practically feasible.
- [1921] arXiv:2402.12598 (cross-list from cs.LG) [ pdf , ps , html , other ]
-
Title: Graph-based Virtual Sensing from Sparse and Partial Multivariate ObservationsComments: Accepted at ICLR 2024Subjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Abstract: Virtual sensing techniques allow for inferring signals at new unmonitored locations by exploiting spatio-temporal measurements coming from physical sensors at different locations. However, as the sensor coverage becomes sparse due to costs or other constraints, physical proximity cannot be used to support interpolation. In this paper, we overcome this challenge by leveraging dependencies between the target variable and a set of correlated variables (covariates) that can frequently be associated with each location of interest. From this viewpoint, covariates provide partial observability, and the problem consists of inferring values for unobserved channels by exploiting observations at other locations to learn how such variables can correlate. We introduce a novel graph-based methodology to exploit such relationships and design a graph deep learning architecture, named GgNet, implementing the framework. The proposed approach relies on propagating information over a nested graph structure that is used to learn dependencies between variables as well as locations. GgNet is extensively evaluated under different virtual sensing scenarios, demonstrating higher reconstruction accuracy compared to the state-of-the-art.
- [1922] arXiv:2402.12617 (cross-list from cs.CR) [ pdf , ps , html , other ]
-
Title: Generative AI Security: Challenges and CountermeasuresSubjects: Cryptography and Security (cs.CR) ; Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Computers and Society (cs.CY); Machine Learning (cs.LG)
Abstract: Generative AI's expanding footprint across numerous industries has led to both excitement and increased scrutiny. This paper delves into the unique security challenges posed by Generative AI, and outlines potential research directions for managing these risks.
- [1923] arXiv:2402.12624 (cross-list from cs.CV) [ pdf , ps , html , other ]
-
Title: Efficient Parameter Mining and Freezing for Continual Object DetectionComments: In Proceedings of the 19th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications - Volume 2: VISAPP, ISBN 978-989-758-679-8, ISSN 2184-4321, pages 466-474Subjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI)
Abstract: Continual Object Detection is essential for enabling intelligent agents to interact proactively with humans in real-world settings. While parameter-isolation strategies have been extensively explored in the context of continual learning for classification, they have yet to be fully harnessed for incremental object detection scenarios. Drawing inspiration from prior research that focused on mining individual neuron responses and integrating insights from recent developments in neural pruning, we proposed efficient ways to identify which layers are the most important for a network to maintain the performance of a detector across sequential updates. The presented findings highlight the substantial advantages of layer-level parameter isolation in facilitating incremental learning within object detection models, offering promising avenues for future research and application in real-world scenarios.
- [1924] arXiv:2402.12627 (cross-list from cs.LG) [ pdf , ps , html , other ]
-
Title: A Comprehensive Review of Machine Learning Advances on Data Change: A Cross-Field PerspectiveSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
Abstract: Recent artificial intelligence (AI) technologies show remarkable evolution in various academic fields and industries. However, in the real world, dynamic data lead to principal challenges for deploying AI models. An unexpected data change brings about severe performance degradation in AI models. We identify two major related research fields, domain shift and concept drift according to the setting of the data change. Although these two popular research fields aim to solve distribution shift and non-stationary data stream problems, the underlying properties remain similar which also encourages similar technical approaches. In this review, we regroup domain shift and concept drift into a single research problem, namely the data change problem, with a systematic overview of state-of-the-art methods in the two research fields. We propose a three-phase problem categorization scheme to link the key ideas in the two technical fields. We thus provide a novel scope for researchers to explore contemporary technical strategies, learn industrial applications, and identify future directions for addressing data change challenges.
- [1925] arXiv:2402.12646 (cross-list from cs.LG) [ pdf , ps , html , other ]
-
Title: Training Artificial Neural Networks by Coordinate Search AlgorithmEhsan Rokhsatyazdi , Shahryar Rahnamayan , Sevil Zanjani Miyandoab , Azam Asilian Bidgoli , H.R. TizhooshComments: 7 pages, 9 figuresJournal-ref: 2023 IEEE Symposium Series on Computational Intelligence (SSCI), pp. 1540-1546. IEEE, 2023Subjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Abstract: Training Artificial Neural Networks poses a challenging and critical problem in machine learning. Despite the effectiveness of gradient-based learning methods, such as Stochastic Gradient Descent (SGD), in training neural networks, they do have several limitations. For instance, they require differentiable activation functions, and cannot optimize a model based on several independent non-differentiable loss functions simultaneously; for example, the F1-score, which is used during testing, can be used during training when a gradient-free optimization algorithm is utilized. Furthermore, the training in any DNN can be possible with a small size of the training dataset. To address these concerns, we propose an efficient version of the gradient-free Coordinate Search (CS) algorithm, an instance of General Pattern Search methods, for training neural networks. The proposed algorithm can be used with non-differentiable activation functions and tailored to multi-objective/multi-loss problems. Finding the optimal values for weights of ANNs is a large-scale optimization problem. Therefore instead of finding the optimal value for each variable, which is the common technique in classical CS, we accelerate optimization and convergence by bundling the weights. In fact, this strategy is a form of dimension reduction for optimization problems. Based on the experimental results, the proposed method, in some cases, outperforms the gradient-based approach, particularly, in situations with insufficient labeled training data. The performance plots demonstrate a high convergence rate, highlighting the capability of our suggested method to find a reasonable solution with fewer function calls. As of now, the only practical and efficient way of training ANNs with hundreds of thousands of weights is gradient-based algorithms such as SGD or Adam. In this paper we introduce an alternative method for training ANN.
- [1926] arXiv:2402.12656 (cross-list from cs.LG) [ pdf , ps , html , other ]
-
Title: HyperMoE: Paying Attention to Unselected Experts in Mixture of Experts via Dynamic TransferSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Abstract: The Mixture of Experts (MoE) for language models has been proven effective in augmenting the capacity of models by dynamically routing each input token to a specific subset of experts for processing. Despite the success, most existing methods face a challenge for balance between sparsity and the availability of expert knowledge: enhancing performance through increased use of expert knowledge often results in diminishing sparsity during expert selection. To mitigate this contradiction, we propose HyperMoE, a novel MoE framework built upon Hypernetworks. This framework integrates the computational processes of MoE with the concept of knowledge transferring in multi-task learning. Specific modules generated based on the information of unselected experts serve as supplementary information, which allows the knowledge of experts not selected to be used while maintaining selection sparsity. Our comprehensive empirical evaluations across multiple datasets and backbones establish that HyperMoE significantly outperforms existing MoE methods under identical conditions concerning the number of experts.
- [1927] arXiv:2402.12659 (cross-list from cs.CL) [ pdf , ps , html , other ]
-
Title: The FinBen: An Holistic Financial Benchmark for Large Language ModelsQianqian Xie , Weiguang Han , Zhengyu Chen , Ruoyu Xiang , Xiao Zhang , Yueru He , Mengxi Xiao , Dong Li , Yongfu Dai , Duanyu Feng , Yijing Xu , Haoqiang Kang , Ziyan Kuang , Chenhan Yuan , Kailai Yang , Zheheng Luo , Tianlin Zhang , Zhiwei Liu , Guojun Xiong , Zhiyang Deng , Yuechen Jiang , Zhiyuan Yao , Haohang Li , Yangyang Yu , Gang Hu , Jiajia Huang , Xiao-Yang Liu , Alejandro Lopez-Lira , Benyou Wang , Yanzhao Lai , Hao Wang , Min Peng , Sophia Ananiadou , Jimin HuangComments: 19 pages, 10 figuresSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI); Computational Engineering, Finance, and Science (cs.CE)
Abstract: LLMs have transformed NLP and shown promise in various fields, yet their potential in finance is underexplored due to a lack of thorough evaluations and the complexity of financial tasks. This along with the rapid development of LLMs, highlights the urgent need for a systematic financial evaluation benchmark for LLMs. In this paper, we introduce FinBen, the first comprehensive open-sourced evaluation benchmark, specifically designed to thoroughly assess the capabilities of LLMs in the financial domain. FinBen encompasses 35 datasets across 23 financial tasks, organized into three spectrums of difficulty inspired by the Cattell-Horn-Carroll theory, to evaluate LLMs' cognitive abilities in inductive reasoning, associative memory, quantitative reasoning, crystallized intelligence, and more. Our evaluation of 15 representative LLMs, including GPT-4, ChatGPT, and the latest Gemini, reveals insights into their strengths and limitations within the financial domain. The findings indicate that GPT-4 leads in quantification, extraction, numerical reasoning, and stock trading, while Gemini shines in generation and forecasting; however, both struggle with complex extraction and forecasting, showing a clear need for targeted enhancements. Instruction tuning boosts simple task performance but falls short in improving complex reasoning and forecasting abilities. FinBen seeks to continuously evaluate LLMs in finance, fostering AI development with regular updates of tasks and models.
- [1928] arXiv:2402.12720 (cross-list from cs.CR) [ pdf , ps , html , other ]
-
Title: Revisiting the Information Capacity of Neural Network Watermarks: Upper Bound Estimation and BeyondComments: Accepted by AAAI 2024Subjects: Cryptography and Security (cs.CR) ; Artificial Intelligence (cs.AI)
Abstract: To trace the copyright of deep neural networks, an owner can embed its identity information into its model as a watermark. The capacity of the watermark quantify the maximal volume of information that can be verified from the watermarked model. Current studies on capacity focus on the ownership verification accuracy under ordinary removal attacks and fail to capture the relationship between robustness and fidelity. This paper studies the capacity of deep neural network watermarks from an information theoretical perspective. We propose a new definition of deep neural network watermark capacity analogous to channel capacity, analyze its properties, and design an algorithm that yields a tight estimation of its upper bound under adversarial overwriting. We also propose a universal non-invasive method to secure the transmission of the identity message beyond capacity by multiple rounds of ownership verification. Our observations provide evidence for neural network owners and defenders that are curious about the tradeoff between the integrity of their ownership and the performance degradation of their products.
- [1929] arXiv:2402.12721 (cross-list from cs.CV) [ pdf , ps , html , other ]
-
Title: PAC-FNO: Parallel-Structured All-Component Fourier Neural Operators for Recognizing Low-Quality ImagesJinsung Jeon , Hyundong Jin , Jonghyun Choi , Sanghyun Hong , Dongeun Lee , Kookjin Lee , Noseong ParkComments: Accepted at ICLR 2024Subjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI)
Abstract: A standard practice in developing image recognition models is to train a model on a specific image resolution and then deploy it. However, in real-world inference, models often encounter images different from the training sets in resolution and/or subject to natural variations such as weather changes, noise types and compression artifacts. While traditional solutions involve training multiple models for different resolutions or input variations, these methods are computationally expensive and thus do not scale in practice. To this end, we propose a novel neural network model, parallel-structured and all-component Fourier neural operator (PAC-FNO), that addresses the problem. Unlike conventional feed-forward neural networks, PAC-FNO operates in the frequency domain, allowing it to handle images of varying resolutions within a single model. We also propose a two-stage algorithm for training PAC-FNO with a minimal modification to the original, downstream model. Moreover, the proposed PAC-FNO is ready to work with existing image recognition models. Extensively evaluating methods with seven image recognition benchmarks, we show that the proposed PAC-FNO improves the performance of existing baseline models on images with various resolutions by up to 77.1% and various types of natural variations in the images at inference.
- [1930] arXiv:2402.12727 (cross-list from cs.LG) [ pdf , ps , html , other ]
-
Title: Diffusion Posterior Sampling is Computationally IntractableSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Statistics Theory (math.ST); Machine Learning (stat.ML)
Abstract: Diffusion models are a remarkably effective way of learning and sampling from a distribution $p(x)$. In posterior sampling, one is also given a measurement model $p(y \mid x)$ and a measurement $y$, and would like to sample from $p(x \mid y)$. Posterior sampling is useful for tasks such as inpainting, super-resolution, and MRI reconstruction, so a number of recent works have given algorithms to heuristically approximate it; but none are known to converge to the correct distribution in polynomial time.
In this paper we show that posterior sampling is \emph{computationally intractable}: under the most basic assumption in cryptography -- that one-way functions exist -- there are instances for which \emph{every} algorithm takes superpolynomial time, even though \emph{unconditional} sampling is provably fast. We also show that the exponential-time rejection sampling algorithm is essentially optimal under the stronger plausible assumption that there are one-way functions that take exponential time to invert. - [1931] arXiv:2402.12728 (cross-list from cs.CV) [ pdf , ps , html , other ]
-
Title: Modality-Aware Integration with Large Language Models for Knowledge-based Visual Question AnsweringComments: 8 pages,3 figures and 1 page appendix; The processed graphs and codes will be avalibaleSubjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Information Retrieval (cs.IR); Machine Learning (cs.LG)
Abstract: Knowledge-based visual question answering (KVQA) has been extensively studied to answer visual questions with external knowledge, e.g., knowledge graphs (KGs). While several attempts have been proposed to leverage large language models (LLMs) as an implicit knowledge source, it remains challenging since LLMs may generate hallucinations. Moreover, multiple knowledge sources, e.g., images, KGs and LLMs, cannot be readily aligned for complex scenarios. To tackle these, we present a novel modality-aware integration with LLMs for KVQA (MAIL). It carefully leverages multimodal knowledge for both image understanding and knowledge reasoning. Specifically, (i) we propose a two-stage prompting strategy with LLMs to densely embody the image into a scene graph with detailed visual features; (ii) We construct a coupled concept graph by linking the mentioned entities with external facts. (iii) A tailored pseudo-siamese graph medium fusion is designed for sufficient multimodal fusion. We utilize the shared mentioned entities in two graphs as mediums to bridge a tight inter-modal exchange, while maximally preserving insightful intra-modal learning by constraining the fusion within mediums. Extensive experiments on two benchmark datasets show the superiority of MAIL with 24x less resources.
- [1932] arXiv:2402.12729 (cross-list from cs.LG) [ pdf , ps , html , other ]
-
Title: Scalable and reliable deep transfer learning for intelligent fault detection via multi-scale neural processes embedded with knowledgeSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Abstract: Deep transfer learning (DTL) is a fundamental method in the field of Intelligent Fault Detection (IFD). It aims to mitigate the degradation of method performance that arises from the discrepancies in data distribution between training set (source domain) and testing set (target domain). Considering the fact that fault data collection is challenging and certain faults are scarce, DTL-based methods face the limitation of available observable data, which reduces the detection performance of the methods in the target domain. Furthermore, DTL-based methods lack comprehensive uncertainty analysis that is essential for building reliable IFD systems. To address the aforementioned problems, this paper proposes a novel DTL-based method known as Neural Processes-based deep transfer learning with graph convolution network (GTNP). Feature-based transfer strategy of GTNP bridges the data distribution discrepancies of source domain and target domain in high-dimensional space. Both the joint modeling based on global and local latent variables and sparse sampling strategy reduce the demand of observable data in the target domain. The multi-scale uncertainty analysis is obtained by using the distribution characteristics of global and local latent variables. Global analysis of uncertainty enables GTNP to provide quantitative values that reflect the complexity of methods and the difficulty of tasks. Local analysis of uncertainty allows GTNP to model uncertainty (confidence of the fault detection result) at each sample affected by noise and bias. The validation of the proposed method is conducted across 3 IFD tasks, consistently showing the superior detection performance of GTNP compared to the other DTL-based methods.
- [1933] arXiv:2402.12730 (cross-list from cs.CL) [ pdf , ps , html , other ]
-
Title: UMBCLU at SemEval-2024 Task 1A and 1C: Semantic Textual Relatedness with and without machine translationComments: Accepted at SemEval 2024 (Colocated with NAACL 2024)Subjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: The aim of SemEval-2024 Task 1, "Semantic Textual Relatedness for African and Asian Languages" is to develop models for identifying semantic textual relatedness (STR) between two sentences using multiple languages (14 African and Asian languages) and settings (supervised, unsupervised, and cross-lingual). Large language models (LLMs) have shown impressive performance on several natural language understanding tasks such as multilingual machine translation (MMT), semantic similarity (STS), and encoding sentence embeddings. Using a combination of LLMs that perform well on these tasks, we developed two STR models, $\textit{TranSem}$ and $\textit{FineSem}$, for the supervised and cross-lingual settings. We explore the effectiveness of several training methods and the usefulness of machine translation. We find that direct fine-tuning on the task is comparable to using sentence embeddings and translating to English leads to better performance for some languages. In the supervised setting, our model performance is better than the official baseline for 3 languages with the remaining 4 performing on par. In the cross-lingual setting, our model performance is better than the baseline for 3 languages (leading to $1^{st}$ place for Africaans and $2^{nd}$ place for Indonesian), is on par for 2 languages and performs poorly on the remaining 7 languages. Our code is publicly available at this https URL .
- [1934] arXiv:2402.12733 (cross-list from cs.IR) [ pdf , ps , html , other ]
-
Title: BMLP: Behavior-aware MLP for Heterogeneous Sequential RecommendationSubjects: Information Retrieval (cs.IR) ; Artificial Intelligence (cs.AI)
Abstract: In real recommendation scenarios, users often have different types of behaviors, such as clicking and buying. Existing research methods show that it is possible to capture the heterogeneous interests of users through different types of behaviors. However, most multi-behavior approaches have limitations in learning the relationship between different behaviors. In this paper, we propose a novel multilayer perceptron (MLP)-based heterogeneous sequential recommendation method, namely behavior-aware multilayer perceptron (BMLP). Specifically, it has two main modules, including a heterogeneous interest perception (HIP) module, which models behaviors at multiple granularities through behavior types and transition relationships, and a purchase intent perception (PIP) module, which adaptively fuses subsequences of auxiliary behaviors to capture users' purchase intent. Compared with mainstream sequence models, MLP is competitive in terms of accuracy and has unique advantages in simplicity and efficiency. Extensive experiments show that BMLP achieves significant improvement over state-of-the-art algorithms on four public datasets. In addition, its pure MLP architecture leads to a linear time complexity.
- [1935] arXiv:2402.12736 (cross-list from cs.CV) [ pdf , ps , html , other ]
-
Title: CST: Calibration Side-Tuning for Parameter and Memory Efficient Transfer LearningSubjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI)
Abstract: Achieving a universally high accuracy in object detection is quite challenging, and the mainstream focus in the industry currently lies on detecting specific classes of objects. However, deploying one or multiple object detection networks requires a certain amount of GPU memory for training and storage capacity for inference. This presents challenges in terms of how to effectively coordinate multiple object detection tasks under resource-constrained conditions. This paper introduces a lightweight fine-tuning strategy called Calibration side tuning, which integrates aspects of adapter tuning and side tuning to adapt the successful techniques employed in transformers for use with ResNet. The Calibration side tuning architecture that incorporates maximal transition calibration, utilizing a small number of additional parameters to enhance network performance while maintaining a smooth training process. Furthermore, this paper has conducted an analysis on multiple fine-tuning strategies and have implemented their application within ResNet, thereby expanding the research on fine-tuning strategies for object detection networks. Besides, this paper carried out extensive experiments using five benchmark datasets. The experimental results demonstrated that this method outperforms other compared state-of-the-art techniques, and a better balance between the complexity and performance of the finetune schemes is achieved.
- [1936] arXiv:2402.12738 (cross-list from cs.CL) [ pdf , ps , html , other ]
-
Title: Can Large Language Models be Used to Provide Psychological Counselling? An Analysis of GPT-4-Generated Responses Using Role-play DialoguesComments: Accepted as a conference paper at IWSDS 2024Subjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI); Human-Computer Interaction (cs.HC)
Abstract: Mental health care poses an increasingly serious challenge to modern societies. In this context, there has been a surge in research that utilizes information technologies to address mental health problems, including those aiming to develop counseling dialogue systems. However, there is a need for more evaluations of the performance of counseling dialogue systems that use large language models. For this study, we collected counseling dialogue data via role-playing scenarios involving expert counselors, and the utterances were annotated with the intentions of the counselors. To determine the feasibility of a dialogue system in real-world counseling scenarios, third-party counselors evaluated the appropriateness of responses from human counselors and those generated by GPT-4 in identical contexts in role-play dialogue data. Analysis of the evaluation results showed that the responses generated by GPT-4 were competitive with those of human counselors.
- [1937] arXiv:2402.12749 (cross-list from cs.CL) [ pdf , ps , other ]
-
Title: Me LLaMA: Foundation Large Language Models for Medical ApplicationsQianqian Xie , Qingyu Chen , Aokun Chen , Cheng Peng , Yan Hu , Fongci Lin , Xueqing Peng , Jimin Huang , Jeffrey Zhang , Vipina Keloth , Xinyu Zhou , Huan He , Lucila Ohno-Machado , Yonghui Wu , Hua Xu , Jiang BianComments: 21 pages, 3 figures, 8 tablesSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI)
Abstract: Recent advancements in large language models (LLMs) such as ChatGPT and LLaMA have hinted at their potential to revolutionize medical applications, yet their application in clinical settings often reveals limitations due to a lack of specialized training on medical-specific data. In response to this challenge, this study introduces Me-LLaMA, a novel medical LLM family that includes foundation models - Me-LLaMA 13/70B, along with their chat-enhanced versions - Me-LLaMA 13/70B-chat, developed through continual pre-training and instruction tuning of LLaMA2 using large medical datasets. Our methodology leverages a comprehensive domain-specific data suite, including a large-scale, continual pre-training dataset with 129B tokens, an instruction tuning dataset with 214k samples, and a new medical evaluation benchmark (MIBE) across six critical medical tasks with 12 datasets. Our extensive evaluation using the MIBE shows that Me-LLaMA models achieve overall better performance than existing open-source medical LLMs in zero-shot, few-shot and supervised learning abilities. With task-specific instruction tuning, Me-LLaMA models outperform ChatGPT on 7 out of 8 datasets and GPT-4 on 5 out of 8 datasets. In addition, we investigated the catastrophic forgetting problem, and our results show that Me-LLaMA models outperform other open-source medical LLMs in mitigating this issue. Me-LLaMA is one of the largest open-source medical foundation LLMs that use both biomedical and clinical data. It exhibits superior performance across both general and medical tasks compared to other open-source medical LLMs, rendering it an attractive choice for medical AI applications. We release our models, datasets, and evaluation scripts at: this https URL .
- [1938] arXiv:2402.12750 (cross-list from cs.CV) [ pdf , ps , other ]
-
Title: Model Composition for Multimodal Large Language ModelsChi Chen , Yiyang Du , Zheng Fang , Ziyue Wang , Fuwen Luo , Peng Li , Ming Yan , Ji Zhang , Fei Huang , Maosong Sun , Yang LiuComments: Code will be available at this https URLSubjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
Abstract: Recent developments in Multimodal Large Language Models (MLLMs) have shown rapid progress, moving towards the goal of creating versatile MLLMs that understand inputs from various modalities. However, existing methods typically rely on joint training with paired multimodal instruction data, which is resource-intensive and challenging to extend to new modalities. In this paper, we propose a new paradigm through the model composition of existing MLLMs to create a new model that retains the modal understanding capabilities of each original model. Our basic implementation, NaiveMC, demonstrates the effectiveness of this paradigm by reusing modality encoders and merging LLM parameters. Furthermore, we introduce DAMC to address parameter interference and mismatch issues during the merging process, thereby enhancing the model performance. To facilitate research in this area, we propose MCUB, a benchmark for assessing ability of MLLMs to understand inputs from diverse modalities. Experiments on this benchmark and four other multimodal understanding tasks show significant improvements over baselines, proving that model composition can create a versatile model capable of processing inputs from multiple modalities.
- [1939] arXiv:2402.12760 (cross-list from cs.MM) [ pdf , ps , html , other ]
-
Title: A User-Friendly Framework for Generating Model-Preferred Prompts in Text-to-Image SynthesisComments: Accepted by The 38th Annual AAAI Conference on Artificial Intelligence (AAAI 2024)Subjects: Multimedia (cs.MM) ; Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
Abstract: Well-designed prompts have demonstrated the potential to guide text-to-image models in generating amazing images. Although existing prompt engineering methods can provide high-level guidance, it is challenging for novice users to achieve the desired results by manually entering prompts due to a discrepancy between novice-user-input prompts and the model-preferred prompts. To bridge the distribution gap between user input behavior and model training datasets, we first construct a novel Coarse-Fine Granularity Prompts dataset (CFP) and propose a novel User-Friendly Fine-Grained Text Generation framework (UF-FGTG) for automated prompt optimization. For CFP, we construct a novel dataset for text-to-image tasks that combines coarse and fine-grained prompts to facilitate the development of automated prompt generation methods. For UF-FGTG, we propose a novel framework that automatically translates user-input prompts into model-preferred prompts. Specifically, we propose a prompt refiner that continually rewrites prompts to empower users to select results that align with their unique needs. Meanwhile, we integrate image-related loss functions from the text-to-image model into the training process of text generation to generate model-preferred prompts. Additionally, we propose an adaptive feature extraction module to ensure diversity in the generated results. Experiments demonstrate that our approach is capable of generating more visually appealing and diverse images than previous state-of-the-art methods, achieving an average improvement of 5% across six quality and aesthetic metrics.
- [1940] arXiv:2402.12782 (cross-list from cs.SE) [ pdf , ps , html , other ]
-
Title: Advancing GenAI Assisted Programming--A Comparative Study on Prompt Efficiency and Code Quality Between GPT-4 and GLM-4Comments: 18 pages, 4 figures, 3 tablesSubjects: Software Engineering (cs.SE) ; Artificial Intelligence (cs.AI)
Abstract: This study aims to explore the best practices for utilizing GenAI as a programming tool, through a comparative analysis between GPT-4 and GLM-4. By evaluating prompting strategies at different levels of complexity, we identify that simplest and straightforward prompting strategy yields best code generation results. Additionally, adding a CoT-like preliminary confirmation step would further increase the success rate. Our results reveal that while GPT-4 marginally outperforms GLM-4, the difference is minimal for average users. In our simplified evaluation model, we see a remarkable 30 to 100-fold increase in code generation efficiency over traditional coding norms. Our GenAI Coding Workshop highlights the effectiveness and accessibility of the prompting methodology developed in this study. We observe that GenAI-assisted coding would trigger a paradigm shift in programming landscape, which necessitates developers to take on new roles revolving around supervising and guiding GenAI, and to focus more on setting high-level objectives and engaging more towards innovation.
- [1941] arXiv:2402.12789 (cross-list from cs.LG) [ pdf , ps , html , other ]
-
Title: Fair Classifiers Without Fair Training: An Influence-Guided Data Sampling ApproachSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Abstract: A fair classifier should ensure the benefit of people from different groups, while the group information is often sensitive and unsuitable for model training. Therefore, learning a fair classifier but excluding sensitive attributes in the training dataset is important. In this paper, we study learning fair classifiers without implementing fair training algorithms to avoid possible leakage of sensitive information. Our theoretical analyses validate the possibility of this approach, that traditional training on a dataset with an appropriate distribution shift can reduce both the upper bound for fairness disparity and model generalization error, indicating that fairness and accuracy can be improved simultaneously with simply traditional training. We then propose a tractable solution to progressively shift the original training data during training by sampling influential data, where the sensitive attribute of new data is not accessed in sampling or used in training. Extensive experiments on real-world data demonstrate the effectiveness of our proposed algorithm.
- [1942] arXiv:2402.12794 (cross-list from cs.RO) [ pdf , ps , other ]
-
Title: Autonomous Reality Modelling for Cultural Heritage Sites employing cooperative quadrupedal robots and unmanned aerial vehiclesSubjects: Robotics (cs.RO) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: Nowadays, the use of advanced sensors, such as terrestrial 3D laser scanners, mobile LiDARs and Unmanned Aerial Vehicles (UAV) photogrammetric imaging, has become the prevalent practice for 3D Reality Modeling and digitization of large-scale monuments of Cultural Heritage (CH). In practice, this process is heavily related to the expertise of the surveying team, handling the laborious planning and time-consuming execution of the 3D mapping process that is tailored to the specific requirements and constraints of each site. To minimize human intervention, this paper introduces a novel methodology for autonomous 3D Reality Modeling for CH monuments by employing au-tonomous biomimetic quadrupedal robotic agents and UAVs equipped with the appropriate sensors. These autonomous robotic agents carry out the 3D RM process in a systematic and repeatable ap-proach. The outcomes of this automated process may find applications in digital twin platforms, facilitating secure monitoring and management of cultural heritage sites and spaces, in both indoor and outdoor environments.
- [1943] arXiv:2402.12810 (cross-list from cs.CV) [ pdf , ps , html , other ]
-
Title: PIP-Net: Pedestrian Intention Prediction in the WildSubjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI); Neural and Evolutionary Computing (cs.NE); Image and Video Processing (eess.IV); Machine Learning (stat.ML)
Abstract: Accurate pedestrian intention prediction (PIP) by Autonomous Vehicles (AVs) is one of the current research challenges in this field. In this article, we introduce PIP-Net, a novel framework designed to predict pedestrian crossing intentions by AVs in real-world urban scenarios. We offer two variants of PIP-Net designed for different camera mounts and setups. Leveraging both kinematic data and spatial features from the driving scene, the proposed model employs a recurrent and temporal attention-based solution, outperforming state-of-the-art performance. To enhance the visual representation of road users and their proximity to the ego vehicle, we introduce a categorical depth feature map, combined with a local motion flow feature, providing rich insights into the scene dynamics. Additionally, we explore the impact of expanding the camera's field of view, from one to three cameras surrounding the ego vehicle, leading to enhancement in the model's contextual perception. Depending on the traffic scenario and road environment, the model excels in predicting pedestrian crossing intentions up to 4 seconds in advance which is a breakthrough in current research studies in pedestrian intention prediction. Finally, for the first time, we present the Urban-PIP dataset, a customised pedestrian intention prediction dataset, with multi-camera annotations in real-world automated driving scenarios.
- [1944] arXiv:2402.12817 (cross-list from cs.CL) [ pdf , ps , html , other ]
-
Title: On Sensitivity of Learning with Limited Labelled Data to the Effects of Randomness: Impact of Interactions and Systematic ChoicesSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: While learning with limited labelled data can improve performance when the labels are lacking, it is also sensitive to the effects of uncontrolled randomness introduced by so-called randomness factors (e.g., varying order of data). We propose a method to systematically investigate the effects of randomness factors while taking the interactions between them into consideration. To measure the true effects of an individual randomness factor, our method mitigates the effects of other factors and observes how the performance varies across multiple runs. Applying our method to multiple randomness factors across in-context learning and fine-tuning approaches on 7 representative text classification tasks and meta-learning on 3 tasks, we show that: 1) disregarding interactions between randomness factors in existing works caused inconsistent findings due to incorrect attribution of the effects of randomness factors, such as disproving the consistent sensitivity of in-context learning to sample order even with random sample selection; and 2) besides mutual interactions, the effects of randomness factors, especially sample order, are also dependent on more systematic choices unexplored in existing works, such as number of classes, samples per class or choice of prompt format.
- [1945] arXiv:2402.12819 (cross-list from cs.CL) [ pdf , ps , html , other ]
-
Title: Comparing Specialised Small and General Large Language Models on Text Classification: 100 Labelled Samples to Achieve Break-Even PerformanceSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: When solving NLP tasks with limited labelled data, researchers can either use a general large language model without further update, or use a small number of labelled examples to tune a specialised smaller model. In this work, we address the research gap of how many labelled samples are required for the specialised small models to outperform general large models, while taking the performance variance into consideration. By observing the behaviour of fine-tuning, instruction-tuning, prompting and in-context learning on 7 language models, we identify such performance break-even points across 8 representative text classification tasks of varying characteristics. We show that the specialised models often need only few samples (on average $10 - 1000$) to be on par or better than the general ones. At the same time, the number of required labels strongly depends on the dataset or task characteristics, with this number being significantly lower on multi-class datasets (up to $100$) than on binary datasets (up to $5000$). When performance variance is taken into consideration, the number of required labels increases on average by $100 - 200\%$ and even up to $1500\%$ in specific cases.
- [1946] arXiv:2402.12835 (cross-list from cs.CL) [ pdf , ps , html , other ]
-
Title: PANDA: Preference Adaptation for Enhancing Domain-Specific Abilities of LLMsAn Liu , Zonghan Yang , Zhenhe Zhang , Qingyuan Hu , Peng Li , Ming Yan , Ji Zhang , Fei Huang , Yang LiuSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI)
Abstract: While Large language models (LLMs) have demonstrated considerable capabilities across various natural language tasks, they often fall short of the performance achieved by domain-specific state-of-the-art models. One potential approach to enhance domain-specific capabilities of LLMs involves fine-tuning them using corresponding datasets. However, this method can be both resource and time-intensive, and not applicable to closed-source commercial LLMs. In this paper, we propose Preference Adaptation for Enhancing Domain-specific Abilities of LLMs (PANDA), a method designed to augment the domain-specific capabilities of LLMs by leveraging insights from the response preference of expert models without requiring fine-tuning. Our experimental results reveal that PANDA significantly enhances the domain-specific ability of LLMs on text classification and interactive decision tasks. Moreover, LLM with PANDA even outperforms the expert model that being learned on 4 tasks of ScienceWorld. This finding highlights the potential of exploring tuning-free approaches to achieve weak-to-strong generalization.
- [1947] arXiv:2402.12842 (cross-list from cs.CL) [ pdf , ps , html , other ]
-
Title: PromptKD: Distilling Student-Friendly Knowledge for Generative Language Models via Prompt TuningSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: Recent advancements in large language models (LLMs) have raised concerns about inference costs, increasing the need for research into model compression. While knowledge distillation (KD) is a prominent method for this, research on KD for generative language models like LLMs is relatively sparse, and the approach of distilling student-friendly knowledge, which has shown promising performance in KD for classification models, remains unexplored in generative language models. To explore this approach, we propose PromptKD, a simple yet effective method that utilizes prompt tuning - for the first time in KD - to enable generative language models to transfer student-friendly knowledge. Unlike previous works in classification that require fine-tuning the entire teacher model for extracting student-friendly knowledge, PromptKD achieves similar effects by adding a small number of prompt tokens and tuning only the prompt with student guidance. Extensive experiments on instruction-following datasets using the GPT-2 model family show that PromptKD achieves state-of-the-art performance while adding only 0.0007% of the teacher's parameters as prompts. Further analysis suggests that distilling student-friendly knowledge alleviates exposure bias effectively throughout the entire training process, leading to performance enhancements.
- [1948] arXiv:2402.12843 (cross-list from cs.CV) [ pdf , ps , html , other ]
-
Title: SolarPanel Segmentation :Self-Supervised Learning for Imperfect DatasetsSankarshanaa Sagaram , Aditya Kasliwal , Krish Didwania , Laven Srivastava , Pallavi Kailas , Ujjwal VermaComments: Published at ICLR Tiny Paper 2024Subjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI)
Abstract: The increasing adoption of solar energy necessitates advanced methodologies for monitoring and maintenance to ensure optimal performance of solar panel installations. A critical component in this context is the accurate segmentation of solar panels from aerial or satellite imagery, which is essential for identifying operational issues and assessing efficiency. This paper addresses the significant challenges in panel segmentation, particularly the scarcity of annotated data and the labour-intensive nature of manual annotation for supervised learning. We explore and apply Self-Supervised Learning (SSL) to solve these challenges. We demonstrate that SSL significantly enhances model generalization under various conditions and reduces dependency on manually annotated data, paving the way for robust and adaptable solar panel segmentation solutions.
- [1949] arXiv:2402.12846 (cross-list from cs.CV) [ pdf , ps , html , other ]
-
Title: ConVQG: Contrastive Visual Question Generation with Multimodal GuidanceComments: AAAI 2024. Project page at this https URLSubjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI)
Abstract: Asking questions about visual environments is a crucial way for intelligent agents to understand rich multi-faceted scenes, raising the importance of Visual Question Generation (VQG) systems. Apart from being grounded to the image, existing VQG systems can use textual constraints, such as expected answers or knowledge triplets, to generate focused questions. These constraints allow VQG systems to specify the question content or leverage external commonsense knowledge that can not be obtained from the image content only. However, generating focused questions using textual constraints while enforcing a high relevance to the image content remains a challenge, as VQG systems often ignore one or both forms of grounding. In this work, we propose Contrastive Visual Question Generation (ConVQG), a method using a dual contrastive objective to discriminate questions generated using both modalities from those based on a single one. Experiments on both knowledge-aware and standard VQG benchmarks demonstrate that ConVQG outperforms the state-of-the-art methods and generates image-grounded, text-guided, and knowledge-rich questions. Our human evaluation results also show preference for ConVQG questions compared to non-contrastive baselines.
- [1950] arXiv:2402.12847 (cross-list from cs.CL) [ pdf , ps , html , other ]
-
Title: Instruction-tuned Language Models are Better Knowledge LearnersZhengbao Jiang , Zhiqing Sun , Weijia Shi , Pedro Rodriguez , Chunting Zhou , Graham Neubig , Xi Victoria Lin , Wen-tau Yih , Srinivasan IyerSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: In order for large language model (LLM)-based assistants to effectively adapt to evolving information needs, it must be possible to update their factual knowledge through continued training on new data. The standard recipe for doing so involves continued pre-training on new documents followed by instruction-tuning on question-answer (QA) pairs. However, we find that LLMs trained with this recipe struggle to answer questions, even though the perplexity of documents is minimized. We found that QA pairs are generally straightforward, while documents are more complex, weaving many factual statements together in an intricate manner. Therefore, we hypothesize that it is beneficial to expose LLMs to QA pairs before continued pre-training on documents so that the process of encoding knowledge from complex documents takes into account how this knowledge is accessed through questions. Based on this, we propose pre-instruction-tuning (PIT), a method that instruction-tunes on questions prior to training on documents. This contrasts with standard instruction-tuning, which learns how to extract knowledge after training on documents. Extensive experiments and ablation studies demonstrate that PIT significantly enhances the ability of LLMs to absorb knowledge from new documents, outperforming standard instruction-tuning by 17.8%.
- [1951] arXiv:2402.12865 (cross-list from cs.CL) [ pdf , ps , html , other ]
-
Title: Backward Lens: Projecting Language Model Gradients into the Vocabulary SpaceSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: Understanding how Transformer-based Language Models (LMs) learn and recall information is a key goal of the deep learning community. Recent interpretability methods project weights and hidden states obtained from the forward pass to the models' vocabularies, helping to uncover how information flows within LMs. In this work, we extend this methodology to LMs' backward pass and gradients. We first prove that a gradient matrix can be cast as a low-rank linear combination of its forward and backward passes' inputs. We then develop methods to project these gradients into vocabulary items and explore the mechanics of how new information is stored in the LMs' neurons.
- [1952] arXiv:2402.12916 (cross-list from cs.LG) [ pdf , ps , other ]
-
Title: Data Pipeline Training: Integrating AutoML to Optimize the Data Flow of Machine Learning ModelsSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Abstract: Data Pipeline plays an indispensable role in tasks such as modeling machine learning and developing data products. With the increasing diversification and complexity of Data sources, as well as the rapid growth of data volumes, building an efficient Data Pipeline has become crucial for improving work efficiency and solving complex problems. This paper focuses on exploring how to optimize data flow through automated machine learning methods by integrating AutoML with Data Pipeline. We will discuss how to leverage AutoML technology to enhance the intelligence of Data Pipeline, thereby achieving better results in machine learning tasks. By delving into the automation and optimization of Data flows, we uncover key strategies for constructing efficient data pipelines that can adapt to the ever-changing data landscape. This not only accelerates the modeling process but also provides innovative solutions to complex problems, enabling more significant outcomes in increasingly intricate data domains. Keywords- Data Pipeline Training;AutoML; Data environment; Machine learning
- [1953] arXiv:2402.12928 (cross-list from cs.DL) [ pdf , ps , html , other ]
-
Title: A Literature Review of Literature Reviews in Pattern Analysis and Machine IntelligenceComments: IEEE version v1. [February 19, 2024] IEEE version v2 with typos fixed. [February 23, 2024] IEEE version v3 with errors fixed. [February 29, 2024] IEEE version v4 with improved quaility. [February 29, 2024]Subjects: Digital Libraries (cs.DL) ; Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
Abstract: By consolidating scattered knowledge, the literature review provides a comprehensive understanding of the investigated topic. However, reading, conducting, or peer-reviewing review papers generally demands a significant investment of time and effort from researchers. To improve efficiency, this paper aims to provide a thorough review of reviews in the PAMI field from diverse perspectives. First, this paper proposes several article-level, field-normalized, and large language model-empowered bibliometric indicators to evaluate reviews. To facilitate this, a meta-data database dubbed RiPAMI, and a topic dataset are constructed. Second, based on these indicators, the study presents comparative analyses of representative reviews, unveiling the characteristics of publications across various fields, periods, and journals. The newly emerging AI-generated literature reviews are also appraised, and the observed differences suggest that most AI-generated reviews still lag behind human-authored reviews in multiple aspects. Third, we briefly provide a subjective evaluation of representative PAMI reviews and introduce a paper structure-based typology of literature reviews. This typology may improve the clarity and effectiveness for scholars in reading and writing reviews, while also serving as a guide for AI systems in generating well-organized reviews. Finally, this work offers insights into the current challenges of literature reviews and envisions future directions for their development.
- [1954] arXiv:2402.12939 (cross-list from cs.LG) [ pdf , ps , html , other ]
-
Title: Discovering Behavioral Modes in Deep Reinforcement Learning Policies Using Trajectory Clustering in Latent SpaceComments: Submitted to the European Control Conference 2024Subjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Abstract: Understanding the behavior of deep reinforcement learning (DRL) agents is crucial for improving their performance and reliability. However, the complexity of their policies often makes them challenging to understand. In this paper, we introduce a new approach for investigating the behavior modes of DRL policies, which involves utilizing dimensionality reduction and trajectory clustering in the latent space of neural networks. Specifically, we use Pairwise Controlled Manifold Approximation Projection (PaCMAP) for dimensionality reduction and TRACLUS for trajectory clustering to analyze the latent space of a DRL policy trained on the Mountain Car control task. Our methodology helps identify diverse behavior patterns and suboptimal choices by the policy, thus allowing for targeted improvements. We demonstrate how our approach, combined with domain knowledge, can enhance a policy's performance in specific regions of the state space.
- [1955] arXiv:2402.12950 (cross-list from cs.SE) [ pdf , ps , html , other ]
-
Title: QuanTest: Entanglement-Guided Testing of Quantum Neural Network SystemsSubjects: Software Engineering (cs.SE) ; Artificial Intelligence (cs.AI)
Abstract: Quantum Neural Network (QNN) combines the Deep Learning (DL) principle with the fundamental theory of quantum mechanics to achieve machine learning tasks with quantum acceleration. Recently, QNN systems have been found to manifest robustness issues similar to classical DL systems. There is an urgent need for ways to test their correctness and security. However, QNN systems differ significantly from traditional quantum software and classical DL systems, posing critical challenges for QNN testing. These challenges include the inapplicability of traditional quantum software testing methods, the dependence of quantum test sample generation on perturbation operators, and the absence of effective information in quantum neurons. In this paper, we propose QuanTest, a quantum entanglement-guided adversarial testing framework to uncover potential erroneous behaviors in QNN systems. We design a quantum entanglement adequacy criterion to quantify the entanglement acquired by the input quantum states from the QNN system, along with two similarity metrics to measure the proximity of generated quantum adversarial examples to the original inputs. Subsequently, QuanTest formulates the problem of generating test inputs that maximize the quantum entanglement sufficiency and capture incorrect behaviors of the QNN system as a joint optimization problem and solves it in a gradient-based manner to generate quantum adversarial examples. Experimental results demonstrate that QuanTest possesses the capability to capture erroneous behaviors in QNN systems (generating 67.48%-96.05% more test samples than the random noise under the same perturbation size constraints). The entanglement-guided approach proves effective in adversarial testing, generating more adversarial examples (maximum increase reached 21.32%).
- [1956] arXiv:2402.12954 (cross-list from cs.LG) [ pdf , ps , html , other ]
-
Title: Conditional Logical Message Passing Transformer for Complex Query AnsweringComments: 13 pages, 3 figures, and 12 tablesSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Logic in Computer Science (cs.LO)
Abstract: Complex Query Answering (CQA) over Knowledge Graphs (KGs) is a challenging task. Given that KGs are usually incomplete, neural models are proposed to solve CQA by performing multi-hop logical reasoning. However, most of them cannot perform well on both one-hop and multi-hop queries simultaneously. Recent work proposes a logical message passing mechanism based on the pre-trained neural link predictors. While effective on both one-hop and multi-hop queries, it ignores the difference between the constant and variable nodes in a query graph. In addition, during the node embedding update stage, this mechanism cannot dynamically measure the importance of different messages, and whether it can capture the implicit logical dependencies related to a node and received messages remains unclear. In this paper, we propose Conditional Logical Message Passing Transformer (CLMPT), which considers the difference between constants and variables in the case of using pre-trained neural link predictors and performs message passing conditionally on the node type. We empirically verified that this approach can reduce computational costs without affecting performance. Furthermore, CLMPT uses the transformer to aggregate received messages and update the corresponding node embedding. Through the self-attention mechanism, CLMPT can assign adaptive weights to elements in an input set consisting of received messages and the corresponding node and explicitly model logical dependencies between various elements. Experimental results show that CLMPT is a new state-of-the-art neural CQA model.
- [1957] arXiv:2402.12969 (cross-list from cs.CL) [ pdf , ps , html , other ]
-
Title: Gl\'orIA -- A Generative and Open Large Language Model for PortugueseComments: Accepted for publication at PROPOR 2024Subjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI)
Abstract: Significant strides have been made in natural language tasks, largely attributed to the emergence of powerful large language models (LLMs). These models, pre-trained on extensive and diverse corpora, have become increasingly capable of comprehending the intricacies of language. Despite the abundance of LLMs for many high-resource languages, the availability of such models remains limited for European Portuguese. We introduce GlórIA, a robust European Portuguese decoder LLM. To pre-train GlórIA, we assembled a comprehensive PT-PT text corpus comprising 35 billion tokens from various sources. We present our pre-training methodology, followed by an assessment of the model's effectiveness on multiple downstream tasks. Additionally, to evaluate our models' language modeling capabilities, we introduce CALAME-PT (Context-Aware LAnguage Modeling Evaluation for Portuguese), the first Portuguese zero-shot language-modeling benchmark. Evaluation shows that GlórIA significantly outperforms existing open PT decoder models in language modeling and that it can generate sound, knowledge-rich, and coherent PT-PT text. The model also exhibits strong potential for various downstream tasks.
- [1958] arXiv:2402.12976 (cross-list from cs.CL) [ pdf , ps , html , other ]
-
Title: The Impact of Demonstrations on Multilingual In-Context Learning: A Multidimensional AnalysisMiaoran Zhang , Vagrant Gautam , Mingyang Wang , Jesujoba O. Alabi , Xiaoyu Shen , Dietrich Klakow , Marius MosbachSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI)
Abstract: In-context learning is a popular inference strategy where large language models solve a task using only a few labelled demonstrations without needing any parameter updates. Compared to work on monolingual (English) in-context learning, multilingual in-context learning is under-explored, and we lack an in-depth understanding of the role of demonstrations in this context. To address this gap, we conduct a multidimensional analysis of multilingual in-context learning, experimenting with 5 models from different model families, 9 datasets covering classification and generation tasks, and 56 typologically diverse languages. Our results reveal that the effectiveness of demonstrations varies significantly across models, tasks, and languages. We also find that Llama 2-Chat, GPT-3.5, and GPT-4 are largely insensitive to the quality of demonstrations. Instead, a carefully crafted template often eliminates the benefits of demonstrations for some tasks and languages altogether. These findings show that the importance of demonstrations might be overestimated. Our work highlights the need for granular evaluation across multiple axes towards a better understanding of in-context learning.
- [1959] arXiv:2402.12984 (cross-list from cs.CL) [ pdf , ps , html , other ]
-
Title: Can GNN be Good Adapter for LLMs?Comments: Accepted by WWW'24Subjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI)
Abstract: Recently, large language models (LLMs) have demonstrated superior capabilities in understanding and zero-shot learning on textual data, promising significant advances for many text-related domains. In the graph domain, various real-world scenarios also involve textual data, where tasks and node features can be described by text. These text-attributed graphs (TAGs) have broad applications in social media, recommendation systems, etc. Thus, this paper explores how to utilize LLMs to model TAGs. Previous methods for TAG modeling are based on million-scale LMs. When scaled up to billion-scale LLMs, they face huge challenges in computational costs. Additionally, they also ignore the zero-shot inference capabilities of LLMs. Therefore, we propose GraphAdapter, which uses a graph neural network (GNN) as an efficient adapter in collaboration with LLMs to tackle TAGs. In terms of efficiency, the GNN adapter introduces only a few trainable parameters and can be trained with low computation costs. The entire framework is trained using auto-regression on node text (next token prediction). Once trained, GraphAdapter can be seamlessly fine-tuned with task-specific prompts for various downstream tasks. Through extensive experiments across multiple real-world TAGs, GraphAdapter based on Llama 2 gains an average improvement of approximately 5\% in terms of node classification. Furthermore, GraphAdapter can also adapt to other language models, including RoBERTa, GPT-2. The promising results demonstrate that GNNs can serve as effective adapters for LLMs in TAG modeling.
- [1960] arXiv:2402.12991 (cross-list from cs.LG) [ pdf , ps , html , other ]
-
Title: TRAP: Targeted Random Adversarial Prompt Honeypot for Black-Box IdentificationSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Cryptography and Security (cs.CR)
Abstract: Large Language Model (LLM) services and models often come with legal rules on who can use them and how they must use them. Assessing the compliance of the released LLMs is crucial, as these rules protect the interests of the LLM contributor and prevent misuse. In this context, we describe the novel problem of Black-box Identity Verification (BBIV). The goal is to determine whether a third-party application uses a certain LLM through its chat function. We propose a method called Targeted Random Adversarial Prompt (TRAP) that identifies the specific LLM in use. We repurpose adversarial suffixes, originally proposed for jailbreaking, to get a pre-defined answer from the target LLM, while other models give random answers. TRAP detects the target LLMs with over 95% true positive rate at under 0.2% false positive rate even after a single interaction. TRAP remains effective even if the LLM has minor changes that do not significantly alter the original function.
- [1961] arXiv:2402.12993 (cross-list from cs.IR) [ pdf , ps , html , other ]
-
Title: An Autonomous Large Language Model Agent for Chemical Literature Data MiningKexin Chen , Hanqun Cao , Junyou Li , Yuyang Du , Menghao Guo , Xin Zeng , Lanqing Li , Jiezhong Qiu , Pheng Ann Heng , Guangyong ChenSubjects: Information Retrieval (cs.IR) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Quantitative Methods (q-bio.QM)
Abstract: Chemical synthesis, which is crucial for advancing material synthesis and drug discovery, impacts various sectors including environmental science and healthcare. The rise of technology in chemistry has generated extensive chemical data, challenging researchers to discern patterns and refine synthesis processes. Artificial intelligence (AI) helps by analyzing data to optimize synthesis and increase yields. However, AI faces challenges in processing literature data due to the unstructured format and diverse writing style of chemical literature. To overcome these difficulties, we introduce an end-to-end AI agent framework capable of high-fidelity extraction from extensive chemical literature. This AI agent employs large language models (LLMs) for prompt generation and iterative optimization. It functions as a chemistry assistant, automating data collection and analysis, thereby saving manpower and enhancing performance. Our framework's efficacy is evaluated using accuracy, recall, and F1 score of reaction condition data, and we compared our method with human experts in terms of content correctness and time efficiency. The proposed approach marks a significant advancement in automating chemical literature extraction and demonstrates the potential for AI to revolutionize data management and utilization in chemistry.
- [1962] arXiv:2402.13025 (cross-list from cs.CL) [ pdf , ps , html , other ]
-
Title: CFEVER: A Chinese Fact Extraction and VERification DatasetYing-Jia Lin , Chun-Yi Lin , Chia-Jen Yeh , Yi-Ting Li , Yun-Yu Hu , Chih-Hao Hsu , Mei-Feng Lee , Hung-Yu KaoComments: AAAI-24Subjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI)
Abstract: We present CFEVER, a Chinese dataset designed for Fact Extraction and VERification. CFEVER comprises 30,012 manually created claims based on content in Chinese Wikipedia. Each claim in CFEVER is labeled as "Supports", "Refutes", or "Not Enough Info" to depict its degree of factualness. Similar to the FEVER dataset, claims in the "Supports" and "Refutes" categories are also annotated with corresponding evidence sentences sourced from single or multiple pages in Chinese Wikipedia. Our labeled dataset holds a Fleiss' kappa value of 0.7934 for five-way inter-annotator agreement. In addition, through the experiments with the state-of-the-art approaches developed on the FEVER dataset and a simple baseline for CFEVER, we demonstrate that our dataset is a new rigorous benchmark for factual extraction and verification, which can be further used for developing automated systems to alleviate human fact-checking efforts. CFEVER is available at this https URL .
- [1963] arXiv:2402.13028 (cross-list from cs.CL) [ pdf , ps , html , other ]
-
Title: Heterogeneous Graph Reasoning for Fact Checking over Texts and TablesComments: Accepted by 38th Association for the Advancement of Artificial Intelligence, AAAISubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI)
Abstract: Fact checking aims to predict claim veracity by reasoning over multiple evidence pieces. It usually involves evidence retrieval and veracity reasoning. In this paper, we focus on the latter, reasoning over unstructured text and structured table information. Previous works have primarily relied on fine-tuning pretrained language models or training homogeneous-graph-based models. Despite their effectiveness, we argue that they fail to explore the rich semantic information underlying the evidence with different structures. To address this, we propose a novel word-level Heterogeneous-graph-based model for Fact Checking over unstructured and structured information, namely HeterFC. Our approach leverages a heterogeneous evidence graph, with words as nodes and thoughtfully designed edges representing different evidence properties. We perform information propagation via a relational graph neural network, facilitating interactions between claims and evidence. An attention-based method is utilized to integrate information, combined with a language model for generating predictions. We introduce a multitask loss function to account for potential inaccuracies in evidence retrieval. Comprehensive experiments on the large fact checking dataset FEVEROUS demonstrate the effectiveness of HeterFC. Code will be released at: this https URL .
- [1964] arXiv:2402.13035 (cross-list from cs.CL) [ pdf , ps , other ]
-
Title: Learning to Check: Unleashing Potentials for Self-Correction in Large Language ModelsSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI)
Abstract: Large language models (LLMs) have made significant strides in reasoning capabilities, with ongoing efforts to refine their reasoning through self-correction. However, recent studies suggest that self-correction can be limited or even counterproductive without external accurate knowledge, raising questions about the limits and effectiveness of self-correction. In this paper, we aim to enhance LLM's self-checking capabilities by meticulously designing training data, thereby improving the accuracy of self-correction. We conduct a detailed analysis of error types in mathematical reasoning and develop a tailored prompt, termed "Step CoT Check". Then we construct a checking-correction dataset for training models. After integrating the original CoT data and checking-correction data for training, we observe that models could improve their self-checking capabilities, thereby enhancing their self-correction capacity and eliminating the need for external feedback or ground truth labels to ascertain the endpoint of correction. We compare the performance of models fine-tuned with the "Step CoT Check" prompt against those refined using other promps within the context of checking-correction data. The "Step CoT Check" outperforms the other two check formats in model with lager parameters, providing more precise feedback thus achieving a higher rate of correctness. For reproducibility, all the datasets and codes are provided in this https URL .
- [1965] arXiv:2402.13037 (cross-list from cs.LG) [ pdf , ps , html , other ]
-
Title: Align Your Intents: Offline Imitation Learning via Optimal TransportSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Abstract: Offline reinforcement learning (RL) addresses the problem of sequential decision-making by learning optimal policy through pre-collected data, without interacting with the environment. As yet, it has remained somewhat impractical, because one rarely knows the reward explicitly and it is hard to distill it retrospectively. Here, we show that an imitating agent can still learn the desired behavior merely from observing the expert, despite the absence of explicit rewards or action labels. In our method, AILOT (Aligned Imitation Learning via Optimal Transport), we involve special representation of states in a form of intents that incorporate pairwise spatial distances within the data. Given such representations, we define intrinsic reward function via optimal transport distance between the expert's and the agent's trajectories. We report that AILOT outperforms state-of-the art offline imitation learning algorithms on D4RL benchmarks and improves the performance of other offline RL algorithms in the sparse-reward tasks.
- [1966] arXiv:2402.13040 (cross-list from cs.LG) [ pdf , ps , html , other ]
-
Title: Text-Guided Molecule Generation with Diffusion Language ModelComments: Accepted by 38th Association for the Advancement of Artificial Intelligence, AAAISubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Computational Engineering, Finance, and Science (cs.CE); Computation and Language (cs.CL); Biomolecules (q-bio.BM)
Abstract: Text-guided molecule generation is a task where molecules are generated to match specific textual descriptions. Recently, most existing SMILES-based molecule generation methods rely on an autoregressive architecture. In this work, we propose the Text-Guided Molecule Generation with Diffusion Language Model (TGM-DLM), a novel approach that leverages diffusion models to address the limitations of autoregressive methods. TGM-DLM updates token embeddings within the SMILES string collectively and iteratively, using a two-phase diffusion generation process. The first phase optimizes embeddings from random noise, guided by the text description, while the second phase corrects invalid SMILES strings to form valid molecular representations. We demonstrate that TGM-DLM outperforms MolT5-Base, an autoregressive model, without the need for additional data resources. Our findings underscore the remarkable effectiveness of TGM-DLM in generating coherent and precise molecules with specific properties, opening new avenues in drug discovery and related scientific domains. Code will be released at: this https URL .
- [1967] arXiv:2402.13055 (cross-list from cs.CL) [ pdf , ps , html , other ]
-
Title: Identifying Semantic Induction Heads to Understand In-Context LearningSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI)
Abstract: Although large language models (LLMs) have demonstrated remarkable performance, the lack of transparency in their inference logic raises concerns about their trustworthiness. To gain a better understanding of LLMs, we conduct a detailed analysis of the operations of attention heads and aim to better understand the in-context learning of LLMs. Specifically, we investigate whether attention heads encode two types of relationships between tokens present in natural languages: the syntactic dependency parsed from sentences and the relation within knowledge graphs. We find that certain attention heads exhibit a pattern where, when attending to head tokens, they recall tail tokens and increase the output logits of those tail tokens. More crucially, the formulation of such semantic induction heads has a close correlation with the emergence of the in-context learning ability of language models. The study of semantic attention heads advances our understanding of the intricate operations of attention heads in transformers, and further provides new insights into the in-context learning of LLMs.
- [1968] arXiv:2402.13077 (cross-list from cs.LG) [ pdf , ps , html , other ]
-
Title: Mechanistic Neural Networks for Scientific Machine LearningSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Neural and Evolutionary Computing (cs.NE)
Abstract: This paper presents Mechanistic Neural Networks, a neural network design for machine learning applications in the sciences. It incorporates a new Mechanistic Block in standard architectures to explicitly learn governing differential equations as representations, revealing the underlying dynamics of data and enhancing interpretability and efficiency in data modeling. Central to our approach is a novel Relaxed Linear Programming Solver (NeuRLP) inspired by a technique that reduces solving linear ODEs to solving linear programs. This integrates well with neural networks and surpasses the limitations of traditional ODE solvers enabling scalable GPU parallel processing. Overall, Mechanistic Neural Networks demonstrate their versatility for scientific machine learning applications, adeptly managing tasks from equation discovery to dynamic systems modeling. We prove their comprehensive capabilities in analyzing and interpreting complex scientific data across various applications, showing significant performance against specialized state-of-the-art methods.
- [1969] arXiv:2402.13089 (cross-list from cs.LG) [ pdf , ps , html , other ]
-
Title: Towards an empirical understanding of MoE design choicesSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
Abstract: In this study, we systematically evaluate the impact of common design choices in Mixture of Experts (MoEs) on validation performance, uncovering distinct influences at token and sequence levels. We also present empirical evidence showing comparable performance between a learned router and a frozen, randomly initialized router, suggesting that learned routing may not be essential. Our study further reveals that Sequence-level routing can result in topic-specific weak expert specialization, in contrast to syntax specialization observed with Token-level routing.
- [1970] arXiv:2402.13093 (cross-list from cs.CL) [ pdf , ps , html , other ]
-
Title: Event-level Knowledge EditingHao Peng , Xiaozhi Wang , Chunyang Li , Kaisheng Zeng , Jiangshan Duo , Yixin Cao , Lei Hou , Juanzi LiComments: 18 pages, 2 figuresSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI)
Abstract: Knowledge editing aims at updating knowledge of large language models (LLMs) to prevent them from becoming outdated. Existing work edits LLMs at the level of factual knowledge triplets. However, natural knowledge updates in the real world come from the occurrences of new events rather than direct changes in factual triplets. In this paper, we propose a new task setting: event-level knowledge editing, which directly edits new events into LLMs and improves over conventional triplet-level editing on (1) Efficiency. A single event edit leads to updates in multiple entailed knowledge triplets. (2) Completeness. Beyond updating factual knowledge, event-level editing also requires considering the event influences and updating LLMs' knowledge about future trends. We construct a high-quality event-level editing benchmark ELKEN, consisting of 1,515 event edits, 6,449 questions about factual knowledge, and 10,150 questions about future tendencies. We systematically evaluate the performance of various knowledge editing methods and LLMs on this benchmark. We find that ELKEN poses significant challenges to existing knowledge editing approaches. Our codes and dataset are publicly released to facilitate further research.
- [1971] arXiv:2402.13098 (cross-list from cs.CL) [ pdf , ps , html , other ]
-
Title: ELAD: Explanation-Guided Large Language Models Active DistillationSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI)
Abstract: The deployment and application of Large Language Models (LLMs) is hindered by their memory inefficiency, computational demands, and the high costs of API inferences. Traditional distillation methods, which transfer the capabilities of LLMs to smaller models, often fail to determine whether the knowledge has been sufficiently transferred, potentially resulting in high costs or incomplete distillation. In this paper, we propose an Explanation-Guided LLMs Active Distillation (ELAD) framework that employs an active learning strategy to optimize the balance between annotation costs and model performance. To improve efficient sample selection, we introduce an explanation-guided sample selection method that identifies samples challenging its reasoning by exploiting uncertainties in explanation steps. Additionally, we present a customized LLM-annotated explanation revision technique where the teacher model detects and corrects flaws in the student model's reasoning. Our experiments across various reasoning datasets demonstrate that our framework significantly enhances the efficiency of LLM knowledge distillation.
- [1972] arXiv:2402.13109 (cross-list from cs.CL) [ pdf , ps , html , other ]
-
Title: CIF-Bench: A Chinese Instruction-Following Benchmark for Evaluating the Generalizability of Large Language ModelsYizhi LI , Ge Zhang , Xingwei Qu , Jiali Li , Zhaoqun Li , Zekun Wang , Hao Li , Ruibin Yuan , Yinghao Ma , Kai Zhang , Wangchunshu Zhou , Yiming Liang , Lei Zhang , Lei Ma , Jiajun Zhang , Zuowen Li , Stephen W. Huang , Chenghua Lin , Wenhu Chen , Jie FuSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI)
Abstract: The advancement of large language models (LLMs) has enhanced the ability to generalize across a wide range of unseen natural language processing (NLP) tasks through instruction-following. Yet, their effectiveness often diminishes in low-resource languages like Chinese, exacerbated by biased evaluations from data leakage, casting doubt on their true generalizability to new linguistic territories. In response, we introduce the Chinese Instruction-Following Benchmark (CIF-Bench), designed to evaluate the zero-shot generalizability of LLMs to the Chinese language. CIF-Bench comprises 150 tasks and 15,000 input-output pairs, developed by native speakers to test complex reasoning and Chinese cultural nuances across 20 categories. To mitigate evaluation bias, we release only half of the dataset publicly, with the remainder kept private, and introduce diversified instructions to minimize score variance, totaling 45,000 data instances. Our evaluation of 28 selected LLMs reveals a noticeable performance gap, with the best model scoring only 52.9%, highlighting the limitations of LLMs in less familiar language and task contexts. This work aims to uncover the current limitations of LLMs in handling Chinese tasks, pushing towards the development of more culturally informed and linguistically diverse models with the released data and benchmark ( this https URL ).
- [1973] arXiv:2402.13114 (cross-list from cs.LG) [ pdf , ps , html , other ]
-
Title: BuffGraph: Enhancing Class-Imbalanced Node Classification via Buffer NodesSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Abstract: Class imbalance in graph-structured data, where minor classes are significantly underrepresented, poses a critical challenge for Graph Neural Networks (GNNs). To address this challenge, existing studies generally generate new minority nodes and edges connecting new nodes to the original graph to make classes balanced. However, they do not solve the problem that majority classes still propagate information to minority nodes by edges in the original graph which introduces bias towards majority classes. To address this, we introduce BuffGraph, which inserts buffer nodes into the graph, modulating the impact of majority classes to improve minor class representation. Our extensive experiments across diverse real-world datasets empirically demonstrate that BuffGraph outperforms existing baseline methods in class-imbalanced node classification in both natural settings and imbalanced settings. Code is available at https://anonymous.4open.science/r/BuffGraph-730A.
- [1974] arXiv:2402.13125 (cross-list from cs.CL) [ pdf , ps , html , other ]
-
Title: TreeEval: Benchmark-Free Evaluation of Large Language Models through Tree PlanningSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI)
Abstract: Recently, numerous new benchmarks have been established to evaluate the performance of large language models (LLMs) via either computing a holistic score or employing another LLM as a judge. However, these approaches suffer from data leakage due to the open access of the benchmark and inflexible evaluation process. To address this issue, we introduce $\textbf{TreeEval}$, a benchmark-free evaluation method for LLMs that let a high-performance LLM host an irreproducible evaluation session and essentially avoids the data leakage. Moreover, this LLM performs as an examiner to raise up a series of questions under a topic with a tree planing strategy, which considers the current evaluation status to decide the next question generation and ensures the completeness and efficiency of the evaluation process. We evaluate $6$ models of different parameter sizes, including $7$B, $13$B, and $33$B, and ultimately achieved the highest correlation coefficient with AlpacaEval2.0 using only around $45$ questions. We also conduct more analysis to show the robustness and reliability of TreeEval. Our code can be accessed via the provided this https URL .
- [1975] arXiv:2402.13126 (cross-list from cs.CR) [ pdf , ps , other ]
-
Title: VGMShield: Mitigating Misuse of Video Generative ModelsComments: 17 pages, 10 figuresSubjects: Cryptography and Security (cs.CR) ; Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Image and Video Processing (eess.IV)
Abstract: With the rapid advancement in video generation, people can conveniently utilize video generation models to create videos tailored to their specific desires. Nevertheless, there are also growing concerns about their potential misuse in creating and disseminating false information.
In this work, we introduce VGMShield: a set of three straightforward but pioneering mitigations through the lifecycle of fake video generation. We start from \textit{fake video detection} trying to understand whether there is uniqueness in generated videos and whether we can differentiate them from real videos; then, we investigate the \textit{tracing} problem, which maps a fake video back to a model that generates it. Towards these, we propose to leverage pre-trained models that focus on {\it spatial-temporal dynamics} as the backbone to identify inconsistencies in videos. Through experiments on seven state-of-the-art open-source models, we demonstrate that current models still cannot perfectly handle spatial-temporal relationships, and thus, we can accomplish detection and tracing with nearly perfect accuracy.
Furthermore, anticipating future generative model improvements, we propose a {\it prevention} method that adds invisible perturbations to images to make the generated videos look unreal. Together with fake video detection and tracing, our multi-faceted set of solutions can effectively mitigate misuse of video generative models. - [1976] arXiv:2402.13145 (cross-list from cs.CL) [ pdf , ps , html , other ]
-
Title: CMDAG: A Chinese Metaphor Dataset with Annotated Grounds as CoT for Boosting Metaphor GenerationYujie Shao , Xinrong Yao , Xingwei Qu , Chenghua Lin , Shi Wang , Stephen W. Huang , Ge Zhang , Jie FuSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI)
Abstract: Metaphor is a prominent linguistic device in human language and literature, as they add color, imagery, and emphasis to enhance effective communication. This paper introduces a large-scale high quality annotated Chinese Metaphor Corpus, which comprises around 28K sentences drawn from a diverse range of Chinese literary sources, such as poems, prose, song lyrics, etc. To ensure the accuracy and consistency of our annotations, we introduce a comprehensive set of guidelines. These guidelines address the facets of metaphor annotation, including identifying tenors, vehicles, and grounds to handling the complexities of similes, personifications, juxtapositions, and hyperboles. Breaking tradition, our approach to metaphor generation emphasizes grounds and their distinct features rather than the conventional combination of tenors and vehicles. By integrating "ground" as a CoT (Chain of Thoughts) input, we are able to generate metaphors that resonate more with real-world intuition. We test generative models such as Belle, Baichuan, and Chinese-alpaca-33B using our annotated corpus. These models are able to generate creative and fluent metaphor sentences more frequently induced by selected samples from our dataset, demonstrating the value of our corpus for Chinese metaphor research. The code is available in this https URL .
- [1977] arXiv:2402.13147 (cross-list from cs.LG) [ pdf , ps , html , other ]
-
Title: SubIQ: Inverse Soft-Q Learning for Offline Imitation with Suboptimal DemonstrationsSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Abstract: We consider offline imitation learning (IL), which aims to mimic the expert's behavior from its demonstration without further interaction with the environment. One of the main challenges in offline IL is dealing with the limited support of expert demonstrations that cover only a small fraction of the state-action spaces. In this work, we consider offline IL, where expert demonstrations are limited but complemented by a larger set of sub-optimal demonstrations of lower expertise levels. Most of the existing offline IL methods developed for this setting are based on behavior cloning or distribution matching, where the aim is to match the occupancy distribution of the imitation policy with that of the expert policy. Such an approach often suffers from over-fitting, as expert demonstrations are limited to accurately represent any occupancy distribution. On the other hand, since sub-optimal sets are much larger, there is a high chance that the imitation policy is trained towards sub-optimal policies. In this paper, to address these issues, we propose a new approach based on inverse soft-Q learning, where a regularization term is added to the training objective, with the aim of aligning the learned rewards with a pre-assigned reward function that allocates higher weights to state-action pairs from expert demonstrations, and lower weights to those from lower expertise levels. On standard benchmarks, our inverse soft-Q learning significantly outperforms other offline IL baselines by a large margin.
- [1978] arXiv:2402.13178 (cross-list from cs.CL) [ pdf , ps , html , other ]
-
Title: Benchmarking Retrieval-Augmented Generation for MedicineComments: Homepage: this https URLSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI)
Abstract: While large language models (LLMs) have achieved state-of-the-art performance on a wide range of medical question answering (QA) tasks, they still face challenges with hallucinations and outdated knowledge. Retrieval-augmented generation (RAG) is a promising solution and has been widely adopted. However, a RAG system can involve multiple flexible components, and there is a lack of best practices regarding the optimal RAG setting for various medical purposes. To systematically evaluate such systems, we propose the Medical Information Retrieval-Augmented Generation Evaluation (MIRAGE), a first-of-its-kind benchmark including 7,663 questions from five medical QA datasets. Using MIRAGE, we conducted large-scale experiments with over 1.8 trillion prompt tokens on 41 combinations of different corpora, retrievers, and backbone LLMs through the MedRAG toolkit introduced in this work. Overall, MedRAG improves the accuracy of six different LLMs by up to 18% over chain-of-thought prompting, elevating the performance of GPT-3.5 and Mixtral to GPT-4-level. Our results show that the combination of various medical corpora and retrievers achieves the best performance. In addition, we discovered a log-linear scaling property and the "lost-in-the-middle" effects in medical RAG. We believe our comprehensive evaluations can serve as practical guidelines for implementing RAG systems for medicine.
- [1979] arXiv:2402.13201 (cross-list from cs.RO) [ pdf , ps , html , other ]
-
Title: Tiny Reinforcement Learning for Quadruped Locomotion using Decision TransformersComments: 10 pages, 4 figuresSubjects: Robotics (cs.RO) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: Resource-constrained robotic platforms are particularly useful for tasks that require low-cost hardware alternatives due to the risk of losing the robot, like in search-and-rescue applications, or the need for a large number of devices, like in swarm robotics. For this reason, it is crucial to find mechanisms for adapting reinforcement learning techniques to the constraints imposed by lower computational power and smaller memory capacities of these ultra low-cost robotic platforms. We try to address this need by proposing a method for making imitation learning deployable onto resource-constrained robotic platforms. Here we cast the imitation learning problem as a conditional sequence modeling task and we train a decision transformer using expert demonstrations augmented with a custom reward. Then, we compress the resulting generative model using software optimization schemes, including quantization and pruning. We test our method in simulation using Isaac Gym, a realistic physics simulation environment designed for reinforcement learning. We empirically demonstrate that our method achieves natural looking gaits for Bittle, a resource-constrained quadruped robot. We also run multiple simulations to show the effects of pruning and quantization on the performance of the model. Our results show that quantization (down to 4 bits) and pruning reduce model size by around 30\% while maintaining a competitive reward, making the model deployable in a resource-constrained system.
- [1980] arXiv:2402.13208 (cross-list from cs.CL) [ pdf , ps , html , other ]
-
Title: How do Hyenas deal with Human Speech? Speech Recognition and Translation with ConfHyenaComments: Accepted at LREC-COLING 2024Subjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI)
Abstract: The attention mechanism, a cornerstone of state-of-the-art neural models, faces computational hurdles in processing long sequences due to its quadratic complexity. Consequently, research efforts in the last few years focused on finding more efficient alternatives. Among them, Hyena (Poli et al., 2023) stands out for achieving competitive results in both language modeling and image classification, while offering sub-quadratic memory and computational complexity. Building on these promising results, we propose ConfHyena, a Conformer whose encoder self-attentions are replaced with an adaptation of Hyena for speech processing, where the long input sequences cause high computational costs. Through experiments in automatic speech recognition (for English) and translation (from English into 8 target languages), we show that our best ConfHyena model significantly reduces the training time by 27%, at the cost of minimal quality degradation (~1%), which, in most cases, is not statistically significant.
- [1981] arXiv:2402.13212 (cross-list from cs.CL) [ pdf , ps , html , other ]
-
Title: Soft Self-Consistency Improves Language Model AgentsComments: 14 pages, the first three authors contributed equally; Code: this https URLSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: Generations from large language models (LLMs) can be improved by sampling and scoring multiple solutions to select a final answer. Current "sample and select" methods such as self-consistency (SC) rely on majority voting to score answers. However, when tasks have many distinct and valid answers, selection by voting requires a large number of samples. This makes SC prohibitively expensive for interactive tasks that involve generating multiple actions (answers) sequentially. After establishing that majority voting fails to provide consistent gains on such tasks, we demonstrate how to increase success rates by softening the scoring criterion. We introduce Soft Self-Consistency (Soft-SC), which replaces SC's discontinuous scoring with a continuous score computed from model likelihoods, allowing for selection even when actions are sparsely distributed. Soft-SC improves both performance and efficiency on long-horizon interactive tasks, requiring half as many samples as SC for comparable or better performance. For a fixed number of samples, Soft-SC leads to a 1.3% increase over SC in absolute success rate on writing bash programs, a 6.6% increase on online shopping (WebShop), and a 4.7% increase for an interactive household game (ALFWorld). Finally, we show that Soft-SC can be applied to both open-source and black-box models.
- [1982] arXiv:2402.13213 (cross-list from cs.CL) [ pdf , ps , html , other ]
-
Title: Softmax Probabilities (Mostly) Predict Large Language Model Correctness on Multiple-Choice Q&ASubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: Although large language models (LLMs) perform impressively on many tasks, overconfidence remains a problem. We hypothesized that on multiple-choice Q&A tasks, wrong answers would be associated with smaller maximum softmax probabilities (MSPs) compared to correct answers. We comprehensively evaluate this hypothesis on ten open-source LLMs and five datasets, and find strong evidence for our hypothesis among models which perform well on the original Q&A task. For the six LLMs with the best Q&A performance, the AUROC derived from the MSP was better than random chance with p < 10^{-4} in 59/60 instances. Among those six LLMs, the average AUROC ranged from 60% to 69%. Leveraging these findings, we propose a multiple-choice Q&A task with an option to abstain and show that performance can be improved by selectively abstaining based on the MSP of the initial model response. We also run the same experiments with pre-softmax logits instead of softmax probabilities and find similar (but not identical) results.
- [1983] arXiv:2402.13217 (cross-list from cs.CV) [ pdf , ps , html , other ]
-
Title: VideoPrism: A Foundational Visual Encoder for Video UnderstandingLong Zhao , Nitesh B. Gundavarapu , Liangzhe Yuan , Hao Zhou , Shen Yan , Jennifer J. Sun , Luke Friedman , Rui Qian , Tobias Weyand , Yue Zhao , Rachel Hornung , Florian Schroff , Ming-Hsuan Yang , David A. Ross , Huisheng Wang , Hartwig Adam , Mikhail Sirotenko , Ting Liu , Boqing GongSubjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI)
Abstract: We introduce VideoPrism, a general-purpose video encoder that tackles diverse video understanding tasks with a single frozen model. We pretrain VideoPrism on a heterogeneous corpus containing 36M high-quality video-caption pairs and 582M video clips with noisy parallel text (e.g., ASR transcripts). The pretraining approach improves upon masked autoencoding by global-local distillation of semantic video embeddings and a token shuffling scheme, enabling VideoPrism to focus primarily on the video modality while leveraging the invaluable text associated with videos. We extensively test VideoPrism on four broad groups of video understanding tasks, from web video question answering to CV for science, achieving state-of-the-art performance on 30 out of 33 video understanding benchmarks.
- [1984] arXiv:2402.13224 (cross-list from math.OC) [ pdf , ps , html , other ]
-
Title: Controlling Large Electric Vehicle Charging Stations via User Behavior Modeling and Stochastic ProgrammingSubjects: Optimization and Control (math.OC) ; Artificial Intelligence (cs.AI); Computational Engineering, Finance, and Science (cs.CE); Machine Learning (cs.LG); Systems and Control (eess.SY)
Abstract: This paper introduces an Electric Vehicle Charging Station (EVCS) model that incorporates real-world constraints, such as slot power limitations, contract threshold overruns penalties, or early disconnections of electric vehicles (EVs). We propose a formulation of the problem of EVCS control under uncertainty, and implement two Multi-Stage Stochastic Programming approaches that leverage user-provided information, namely, Model Predictive Control and Two-Stage Stochastic Programming. The model addresses uncertainties in charging session start and end times, as well as in energy demand. A user's behavior model based on a sojourn-time-dependent stochastic process enhances cost reduction while maintaining customer satisfaction. The benefits of the two proposed methods are showcased against two baselines over a 22-day simulation using a real-world dataset. The two-stage approach demonstrates robustness against early disconnections by considering a wider range of uncertainty scenarios for optimization. The algorithm prioritizing user satisfaction over electricity cost achieves a 20% and 36% improvement in two user satisfaction metrics compared to an industry-standard baseline. Additionally, the algorithm striking the best balance between cost and user satisfaction exhibits a mere 3% relative cost increase compared to the theoretically optimal baseline - for which the nonanticipativity constraint is relaxed - while attaining 94% and 84% of the user satisfaction performance in the two used satisfaction metrics.
- [1985] arXiv:2402.13225 (cross-list from cs.CL) [ pdf , ps , other ]
-
Title: AgentMD: Empowering Language Agents for Risk Prediction with Large-Scale Clinical Tool LearningQiao Jin , Zhizheng Wang , Yifan Yang , Qingqing Zhu , Donald Wright , Thomas Huang , W John Wilbur , Zhe He , Andrew Taylor , Qingyu Chen , Zhiyong LuComments: Work in progressSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI)
Abstract: Clinical calculators play a vital role in healthcare by offering accurate evidence-based predictions for various purposes such as prognosis. Nevertheless, their widespread utilization is frequently hindered by usability challenges, poor dissemination, and restricted functionality. Augmenting large language models with extensive collections of clinical calculators presents an opportunity to overcome these obstacles and improve workflow efficiency, but the scalability of the manual curation process poses a significant challenge. In response, we introduce AgentMD, a novel language agent capable of curating and applying clinical calculators across various clinical contexts. Using the published literature, AgentMD has automatically curated a collection of 2,164 diverse clinical calculators with executable functions and structured documentation, collectively named RiskCalcs. Manual evaluations show that RiskCalcs tools achieve an accuracy of over 80% on three quality metrics. At inference time, AgentMD can automatically select and apply the relevant RiskCalcs tools given any patient description. On the newly established RiskQA benchmark, AgentMD significantly outperforms chain-of-thought prompting with GPT-4 (87.7% vs. 40.9% in accuracy). Additionally, we also applied AgentMD to real-world clinical notes for analyzing both population-level and risk-level patient characteristics. In summary, our study illustrates the utility of language agents augmented with clinical calculators for healthcare analytics and patient care.
- [1986] arXiv:2402.13226 (cross-list from eess.IV) [ pdf , ps , html , other ]
-
Title: NeRF Solves Undersampled MRI ReconstructionSubjects: Image and Video Processing (eess.IV) ; Artificial Intelligence (cs.AI); Computational Engineering, Finance, and Science (cs.CE); Signal Processing (eess.SP)
Abstract: This article presents a novel undersampled magnetic resonance imaging (MRI) technique that leverages the concept of Neural Radiance Field (NeRF). With radial undersampling, the corresponding imaging problem can be reformulated into an image modeling task from sparse-view rendered data; therefore, a high dimensional MR image is obtainable from undersampled k-space data by taking advantage of implicit neural representation. A multi-layer perceptron, which is designed to output an image intensity from a spatial coordinate, learns the MR physics-driven rendering relation between given measurement data and desired image. Effective undersampling strategies for high-quality neural representation are investigated. The proposed method serves two benefits: (i) The learning is based fully on single undersampled k-space data, not a bunch of measured data and target image sets. It can be used potentially for diagnostic MR imaging, such as fetal MRI, where data acquisition is relatively rare or limited against diversity of clinical images while undersampled reconstruction is highly demanded. (ii) A reconstructed MR image is a scan-specific representation highly adaptive to the given k-space measurement. Numerous experiments validate the feasibility and capability of the proposed approach.
- [1987] arXiv:2402.13228 (cross-list from cs.CL) [ pdf , ps , html , other ]
-
Title: Smaug: Fixing Failure Modes of Preference Optimisation with DPO-PositiveSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: Direct Preference Optimisation (DPO) is effective at significantly improving the performance of large language models (LLMs) on downstream tasks such as reasoning, summarisation, and alignment. Using pairs of preferred and dispreferred data, DPO models the \textit{relative} probability of picking one response over another. In this work, first we show theoretically that the standard DPO loss can lead to a \textit{reduction} of the model's likelihood of the preferred examples, as long as the relative probability between the preferred and dispreferred classes increases. We then show empirically that this phenomenon occurs when fine-tuning LLMs on common datasets, especially datasets in which the edit distance between pairs of completions is low. Using these insights, we design DPO-Positive (DPOP), a new loss function and training procedure which avoids this failure mode. Surprisingly, we also find that DPOP significantly outperforms DPO across a wide variety of datasets and downstream tasks, including datasets with high edit distances between completions. By fine-tuning with DPOP, we create and release Smaug-34B and Smaug-72B, which achieve state-of-the-art open-source performance. Notably, Smaug-72B is nearly 2\% better than any other open-source model on the HuggingFace Open LLM Leaderboard and becomes the first open-source LLM to surpass an average accuracy of 80\%.
- [1988] arXiv:2402.13241 (cross-list from cs.LG) [ pdf , ps , html , other ]
-
Title: Federated Causal Discovery from Heterogeneous DataLoka Li , Ignavier Ng , Gongxu Luo , Biwei Huang , Guangyi Chen , Tongliang Liu , Bin Gu , Kun ZhangComments: ICLR 2024Subjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Abstract: Conventional causal discovery methods rely on centralized data, which is inconsistent with the decentralized nature of data in many real-world situations. This discrepancy has motivated the development of federated causal discovery (FCD) approaches. However, existing FCD methods may be limited by their potentially restrictive assumptions of identifiable functional causal models or homogeneous data distributions, narrowing their applicability in diverse scenarios. In this paper, we propose a novel FCD method attempting to accommodate arbitrary causal models and heterogeneous data. We first utilize a surrogate variable corresponding to the client index to account for the data heterogeneity across different clients. We then develop a federated conditional independence test (FCIT) for causal skeleton discovery and establish a federated independent change principle (FICP) to determine causal directions. These approaches involve constructing summary statistics as a proxy of the raw data to protect data privacy. Owing to the nonparametric properties, FCIT and FICP make no assumption about particular functional forms, thereby facilitating the handling of arbitrary causal models. We conduct extensive experiments on synthetic and real datasets to show the efficacy of our method. The code is available at this https URL .
- [1989] arXiv:2402.13249 (cross-list from cs.CL) [ pdf , ps , html , other ]
-
Title: TofuEval: Evaluating Hallucinations of LLMs on Topic-Focused Dialogue SummarizationLiyan Tang , Igor Shalyminov , Amy Wing-mei Wong , Jon Burnsky , Jake W. Vincent , Yu'an Yang , Siffi Singh , Song Feng , Hwanjun Song , Hang Su , Lijia Sun , Yi Zhang , Saab Mansour , Kathleen McKeownComments: NAACL 2024; Linguistic annotations available at this https URLSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI)
Abstract: Single document news summarization has seen substantial progress on faithfulness in recent years, driven by research on the evaluation of factual consistency, or hallucinations. We ask whether these advances carry over to other text summarization domains. We propose a new evaluation benchmark on topic-focused dialogue summarization, generated by LLMs of varying sizes. We provide binary sentence-level human annotations of the factual consistency of these summaries along with detailed explanations of factually inconsistent sentences. Our analysis shows that existing LLMs hallucinate significant amounts of factual errors in the dialogue domain, regardless of the model's size. On the other hand, when LLMs, including GPT-4, serve as binary factual evaluators, they perform poorly and can be outperformed by prevailing state-of-the-art specialized factuality evaluation metrics. Finally, we conducted an analysis of hallucination types with a curated error taxonomy. We find that there are diverse errors and error distributions in model-generated summaries and that non-LLM based metrics can capture all error types better than LLM-based evaluators.
- [1990] arXiv:2402.13254 (cross-list from cs.CV) [ pdf , ps , html , other ]
-
Title: CounterCurate: Enhancing Physical and Semantic Visio-Linguistic Compositional Reasoning via Counterfactual ExamplesComments: 13 pages, 6 figures, 8 tables, Project Page: this https URLSubjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG)
Abstract: We propose CounterCurate, a framework to comprehensively improve the visio-linguistic compositional reasoning capability for both contrastive and generative multimodal models. In particular, we identify two critical under-explored problems: the neglect of the physically grounded reasoning (counting and position understanding) and the potential of using highly capable text and image generation models for semantic counterfactual fine-tuning. Our work pioneers an approach that addresses these gaps. We first spotlight the near-chance performance of multimodal models like CLIP and LLaVA in physically grounded compositional reasoning. We then apply simple data augmentation using grounded image generation model GLIGEN to generate fine-tuning data, resulting in significant performance improvements: +33% and +37% for CLIP and LLaVA, respectively, on our newly curated Flickr30k-Positions benchmark. Moreover, we exploit the capabilities of high-performing text generation and image generation models, specifically GPT-4V and DALLE-3, to curate challenging semantic counterfactuals, thereby further enhancing compositional reasoning capabilities on benchmarks such as SugarCrepe, where CounterCurate outperforms GPT-4V.
- [1991] arXiv:2402.13270 (cross-list from physics.ao-ph) [ pdf , ps , html , other ]
-
Title: Global Tropical Cyclone Intensity Forecasting with Multi-modal Multi-scale Causal Autoregressive ModelSubjects: Atmospheric and Oceanic Physics (physics.ao-ph) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Data Analysis, Statistics and Probability (physics.data-an)
Abstract: Accurate forecasting of Tropical cyclone (TC) intensity is crucial for formulating disaster risk reduction strategies. Current methods predominantly rely on limited spatiotemporal information from ERA5 data and neglect the causal relationships between these physical variables, failing to fully capture the spatial and temporal patterns required for intensity forecasting. To address this issue, we propose a Multi-modal multi-Scale Causal AutoRegressive model (MSCAR), which is the first model that combines causal relationships with large-scale multi-modal data for global TC intensity autoregressive forecasting. Furthermore, given the current absence of a TC dataset that offers a wide range of spatial variables, we present the Satellite and ERA5-based Tropical Cyclone Dataset (SETCD), which stands as the longest and most comprehensive global dataset related to TCs. Experiments on the dataset show that MSCAR outperforms the state-of-the-art methods, achieving maximum reductions in global and regional forecast errors of 9.52% and 6.74%, respectively. The code and dataset are publicly available at https://anonymous.4open.science/r/MSCAR.
- [1992] arXiv:2402.13275 (cross-list from q-bio.NC) [ pdf , ps , html , other ]
-
Title: Implementation of a Model of the Cortex Basal Ganglia LoopComments: 7 pages, 3 figuresSubjects: Neurons and Cognition (q-bio.NC) ; Artificial Intelligence (cs.AI)
Abstract: This article presents a simple model of the cortex-basal ganglia-thalamus loop, which is thought to serve for action selection and executions, and reports the results of its implementation. The model is based on the hypothesis that the cerebral cortex predicts actions, while the basal ganglia use reinforcement learning to decide whether to perform the actions predicted by the cortex. The implementation is intended to be used as a component of models of the brain consisting of cortical regions or brain-inspired cognitive architectures.
- [1993] arXiv:2402.13276 (cross-list from eess.AS) [ pdf , ps , html , other ]
-
Title: When LLMs Meets Acoustic Landmarks: An Efficient Approach to Integrate Speech into Large Language Models for Depression DetectionSubjects: Audio and Speech Processing (eess.AS) ; Artificial Intelligence (cs.AI); Sound (cs.SD)
Abstract: Depression is a critical concern in global mental health, prompting extensive research into AI-based detection methods. Among various AI technologies, Large Language Models (LLMs) stand out for their versatility in mental healthcare applications. However, their primary limitation arises from their exclusive dependence on textual input, which constrains their overall capabilities. Furthermore, the utilization of LLMs in identifying and analyzing depressive states is still relatively untapped. In this paper, we present an innovative approach to integrating acoustic speech information into the LLMs framework for multimodal depression detection. We investigate an efficient method for depression detection by integrating speech signals into LLMs utilizing Acoustic Landmarks. By incorporating acoustic landmarks, which are specific to the pronunciation of spoken words, our method adds critical dimensions to text transcripts. This integration also provides insights into the unique speech patterns of individuals, revealing the potential mental states of individuals. Evaluations of the proposed approach on the DAIC-WOZ dataset reveal state-of-the-art results when compared with existing Audio-Text baselines. In addition, this approach is not only valuable for the detection of depression but also represents a new perspective in enhancing the ability of LLMs to comprehend and process speech signals.
- [1994] arXiv:2402.13284 (cross-list from cs.DB) [ pdf , ps , html , other ]
-
Title: Structure Guided Large Language Model for SQL GenerationSubjects: Databases (cs.DB) ; Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
Abstract: Generating accurate Structured Querying Language (SQL) is a long-standing problem, especially in matching users' semantic queries with structured databases and then generating structured SQL. Existing models typically input queries and database schemas into the LLM and rely on the LLM to perform semantic-structure matching and generate structured SQL. However, such solutions overlook the structural information within user queries and databases, which can be utilized to enhance the generation of structured SQL. This oversight can lead to inaccurate or unexecutable SQL generation. To fully exploit the structure, we propose a structure-to-SQL framework, which leverages the inherent structure information to improve the SQL generation of LLMs. Specifically, we introduce our Structure Guided SQL~(SGU-SQL) generation model. SGU-SQL first links user queries and databases in a structure-enhanced manner. It then decomposes complicated linked structures with grammar trees to guide the LLM to generate the SQL step by step. Extensive experiments on two benchmark datasets illustrate that SGU-SQL can outperform sixteen SQL generation baselines.
- [1995] arXiv:2402.13287 (cross-list from cs.CR) [ pdf , ps , html , other ]
-
Title: Manipulating hidden-Markov-model inferences by corrupting batch dataComments: 42 pages, 8 figures, 11 tablesJournal-ref: Caballero, W. N., Camacho, J. M., Ekin, T., & Naveiro, R. (2024). Manipulating hidden-Markov-model inferences by corrupting batch data. Computers & Operations Research, 162, 106478Subjects: Cryptography and Security (cs.CR) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: Time-series models typically assume untainted and legitimate streams of data. However, a self-interested adversary may have incentive to corrupt this data, thereby altering a decision maker's inference. Within the broader field of adversarial machine learning, this research provides a novel, probabilistic perspective toward the manipulation of hidden Markov model inferences via corrupted data. In particular, we provision a suite of corruption problems for filtering, smoothing, and decoding inferences leveraging an adversarial risk analysis approach. Multiple stochastic programming models are set forth that incorporate realistic uncertainties and varied attacker objectives. Three general solution methods are developed by alternatively viewing the problem from frequentist and Bayesian perspectives. The efficacy of each method is illustrated via extensive, empirical testing. The developed methods are characterized by their solution quality and computational effort, resulting in a stratification of techniques across varying problem-instance architectures. This research highlights the weaknesses of hidden Markov models under adversarial activity, thereby motivating the need for robustification techniques to ensure their security.
- [1996] arXiv:2402.13288 (cross-list from cs.DB) [ pdf , ps , html , other ]
-
Title: Training Table Question Answering via SQL Query DecompositionSubjects: Databases (cs.DB) ; Artificial Intelligence (cs.AI)
Abstract: Table Question-Answering involves both understanding the natural language query and grounding it in the context of the input table to extract the relevant information. In this context, many methods have highlighted the benefits of intermediate pre-training from SQL queries. However, while most approaches aim at generating final answers from inputs directly, we claim that there is better to do with SQL queries during training. By learning to imitate a restricted portion of SQL-like algebraic operations, we show that their execution flow provides intermediate supervision steps that allow increased generalization and structural reasoning compared with classical approaches of the field. Our study bridges the gap between semantic parsing and direct answering methods and provides useful insights regarding what types of operations should be predicted by a generative architecture or be preferably executed by an external algorithm.
- [1997] arXiv:2402.13292 (cross-list from cs.MA) [ pdf , ps , html , other ]
-
Title: A Conflict-Aware Optimal Goal Assignment Algorithm for Multi-Robot SystemsSubjects: Multiagent Systems (cs.MA) ; Artificial Intelligence (cs.AI); Robotics (cs.RO)
Abstract: The fundamental goal assignment problem for a multi-robot application aims to assign a unique goal to each robot while ensuring collision-free paths, minimizing the total movement cost. A plausible algorithmic solution to this NP-hard problem involves an iterative process that integrates a task planner to compute the goal assignment while ignoring the collision possibilities among the robots and a multi-agent path-finding algorithm to find the collision-free trajectories for a given assignment. This procedure involves a method for computing the next best assignment given the current best assignment. A naive way of computing the next best assignment, as done in the state-of-the-art solutions, becomes a roadblock to achieving scalability in solving the overall problem. To obviate this bottleneck, we propose an efficient conflict-guided method to compute the next best assignment. Additionally, we introduce two more optimizations to the algorithm -- first for avoiding the unconstrained path computations between robot-goal pairs wherever possible, and the second to prevent duplicate constrained path computations for multiple robot-goal pairs. We extensively evaluate our algorithm for up to a hundred robots on several benchmark workspaces. The results demonstrate that the proposed algorithm achieves nearly an order of magnitude speedup over the state-of-the-art algorithm, showcasing its efficacy in real-world scenarios.
- [1998] arXiv:2402.13297 (cross-list from q-bio.QM) [ pdf , ps , html , other ]
-
Title: Integrating Deep Learning and Synthetic Biology: A Co-Design Approach for Enhancing Gene Expression via N-terminal Coding SequencesSubjects: Quantitative Methods (q-bio.QM) ; Artificial Intelligence (cs.AI)
Abstract: N-terminal coding sequence (NCS) influences gene expression by impacting the translation initiation rate. The NCS optimization problem is to find an NCS that maximizes gene expression. The problem is important in genetic engineering. However, current methods for NCS optimization such as rational design and statistics-guided approaches are labor-intensive yield only relatively small improvements. This paper introduces a deep learning/synthetic biology co-designed few-shot training workflow for NCS optimization. Our method utilizes k-nearest encoding followed by word2vec to encode the NCS, then performs feature extraction using attention mechanisms, before constructing a time-series network for predicting gene expression intensity, and finally a direct search algorithm identifies the optimal NCS with limited training data. We took green fluorescent protein (GFP) expressed by Bacillus subtilis as a reporting protein of NCSs, and employed the fluorescence enhancement factor as the metric of NCS optimization. Within just six iterative experiments, our model generated an NCS (MLD62) that increased average GFP expression by 5.41-fold, outperforming the state-of-the-art NCS designs. Extending our findings beyond GFP, we showed that our engineered NCS (MLD62) can effectively boost the production of N-acetylneuraminic acid by enhancing the expression of the crucial rate-limiting GNA1 gene, demonstrating its practical utility. We have open-sourced our NCS expression database and experimental procedures for public use.
- [1999] arXiv:2402.13301 (cross-list from cs.SD) [ pdf , ps , html , other ]
-
Title: Structure-informed Positional Encoding for Music GenerationJournal-ref: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Apr 2024, Seoul, South KoreaSubjects: Sound (cs.SD) ; Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
Abstract: Music generated by deep learning methods often suffers from a lack of coherence and long-term organization. Yet, multi-scale hierarchical structure is a distinctive feature of music signals. To leverage this information, we propose a structure-informed positional encoding framework for music generation with Transformers. We design three variants in terms of absolute, relative and non-stationary positional information. We comprehensively test them on two symbolic music generation tasks: next-timestep prediction and accompaniment generation. As a comparison, we choose multiple baselines from the literature and demonstrate the merits of our methods using several musically-motivated evaluation metrics. In particular, our methods improve the melodic and structural consistency of the generated pieces.
- [2000] arXiv:2402.13304 (cross-list from cs.LG) [ pdf , ps , html , other ]
-
Title: Harmful algal bloom forecasting. A comparison between stream and batch learningAndres Molares-Ulloa , Elisabet Rocruz , Daniel Rivero , Xosé A. Padin , Rita Nolasco , Jesús Dubert , Enrique Fernandez-BlancoSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Abstract: Diarrhetic Shellfish Poisoning (DSP) is a global health threat arising from shellfish contaminated with toxins produced by dinoflagellates. The condition, with its widespread incidence, high morbidity rate, and persistent shellfish toxicity, poses risks to public health and the shellfish industry. High biomass of toxin-producing algae such as DSP are known as Harmful Algal Blooms (HABs). Monitoring and forecasting systems are crucial for mitigating HABs impact. Predicting harmful algal blooms involves a time-series-based problem with a strong historical seasonal component, however, recent anomalies due to changes in meteorological and oceanographic events have been observed. Stream Learning stands out as one of the most promising approaches for addressing time-series-based problems with concept drifts. However, its efficacy in predicting HABs remains unproven and needs to be tested in comparison with Batch Learning. Historical data availability is a critical point in developing predictive systems. In oceanography, the available data collection can have some constrains and limitations, which has led to exploring new tools to obtain more exhaustive time series. In this study, a machine learning workflow for predicting the number of cells of a toxic dinoflagellate, Dinophysis acuminata, was developed with several key advancements. Seven machine learning algorithms were compared within two learning paradigms. Notably, the output data from CROCO, the ocean hydrodynamic model, was employed as the primary dataset, palliating the limitation of time-continuous historical data. This study highlights the value of models interpretability, fair models comparison methodology, and the incorporation of Stream Learning models. The model DoME, with an average R2 of 0.77 in the 3-day-ahead prediction, emerged as the most effective and interpretable predictor, outperforming the other algorithms.
- [2001] arXiv:2402.13326 (cross-list from q-fin.CP) [ pdf , ps , html , other ]
-
Title: Deep Hedging with Market ImpactComments: 13 pages, 5 figuresSubjects: Computational Finance (q-fin.CP) ; Artificial Intelligence (cs.AI)
Abstract: Dynamic hedging is the practice of periodically transacting financial instruments to offset the risk caused by an investment or a liability. Dynamic hedging optimization can be framed as a sequential decision problem; thus, Reinforcement Learning (RL) models were recently proposed to tackle this task. However, existing RL works for hedging do not consider market impact caused by the finite liquidity of traded instruments. Integrating such feature can be crucial to achieve optimal performance when hedging options on stocks with limited liquidity. In this paper, we propose a novel general market impact dynamic hedging model based on Deep Reinforcement Learning (DRL) that considers several realistic features such as convex market impacts, and impact persistence through time. The optimal policy obtained from the DRL model is analysed using several option hedging simulations and compared to commonly used procedures such as delta hedging. Results show our DRL model behaves better in contexts of low liquidity by, among others: 1) learning the extent to which portfolio rebalancing actions should be dampened or delayed to avoid high costs, 2) factoring in the impact of features not considered by conventional approaches, such as previous hedging errors through the portfolio value, and the underlying asset's drift (i.e. the magnitude of its expected return).
Total of 2756 entries :
2-2001
2001-2756